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Preface 


Welcome  to  Collaborative  Statistics,  presented  by  Connexions.  The  initial  section  below  introduces  you  to 
Connexions.  If  you  are  familiar  with  Connexions,  please  skip  to  About  "Collaborative  Statistics."  (Section  : 
About  Connexions) 

About  Connexions 

Connexions  Modular  Content 

Connexions  (cnx.org'^ )  is  an  online,  open  access  educational  resource  dedicated  to  providing  high  quality 
learning  materials  free  online,  free  in  printable  PDF  format,  and  at  low  cost  in  bound  volumes  through 
print-on-demand  publishing.  The  Collaborative  Statistics  textbook  is  one  of  many  collections  available 
to  Connexions  users.  Each  collection  is  composed  of  a  number  of  re-usable  learning  modules  written  in 
the  Connexions  XML  markup  language.  Each  module  may  also  be  re-used  (or  're-purposed')  as  part  of 
other  collections  and  may  be  used  outside  of  Connexions.  Including  Collaborative  Statistics,  Connexions 
currently  offers  over  6500  modiiles  and  more  than  350  collections. 

The  modules  of  Collaborative  Statistics  are  derived  from  the  original  paper  version  of  the  textbook  under 
the  same  title.  Collaborative  Statistics.  Each  module  represents  a  self-contained  concept  from  the  original 
work.  Together,  the  modules  comprise  the  original  textbook. 

Re-use  and  Customization 

The  Creative  Commons  (CC)  Attribution  license''  applies  to  all  Connexions  modules.  Under  this  license, 
any  module  in  Connexions  may  be  used  or  modified  for  any  purpose  as  long  as  proper  attribution  to  the 
original  author(s)  is  maintained.  Connexions'  authoring  tools  make  re-use  (or  re-purposing)  easy.  There- 
fore, instructors  an5rwhere  are  permitted  to  create  customized  versions  of  the  Collaborative  Statistics  text- 
book by  editing  modules,  deleting  unneeded  modules,  and  adding  their  own  supplementary  modules. 
Connexions'  authoring  tools  keep  track  of  these  changes  and  maintain  the  CC  license's  reqiiired  attribution 
to  the  original  authors.  This  process  creates  a  new  collection  that  can  be  viewed  online,  downloaded  as  a 
single  PDF  file,  or  ordered  in  any  quantity  by  instructors  and  students  as  a  low-cost  printed  textbook.  To 
start  building  custom  collections,  please  visit  the  help  page,  "Create  a  Collection  with  Existing  Modules"^  . 
For  a  guide  to  authoring  modules,  please  look  at  the  help  page,  "Create  a  Module  in  Minutes"^  . 

Read  the  book  online,  print  the  PDF,  or  buy  a  copy  of  the  book. 

To  browse  the  Collaborative  Statistics  textbook  online,  visit  the  collection  home  page  at 
cnx.org/ content/ coll0522/latest^.  You  wUl  then  have  three  options. 

^This  content  is  available  online  at  <http://caTx.org/content/Hil6026/1.16/>. 

^http://cnx.org/ 

^http:/ / creativecommons.org/licenses/by/2.0/ 
*http:/ / cnx.org/help/CreateCollection 
^http://cnx.org/help/ModulelnMinutes 

^Collaborative  Statistics  <http://cnx.org/content/coll0522/latest/> 

Available  for  free  at  Corvnexions  <http://cnx.org/content/coll0522/1.40> 
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1.  You  may  obtain  a  PDF  of  the  entire  textbook  to  print  or  view  offline  by  clicking  on  the  "Download 
PDF"  link  in  the  "Content  Actions"  box. 

2.  You  may  order  a  bound  copy  of  the  collection  by  clicking  on  the  "Order  Printed  Copy"  button. 

3.  You  may  view  the  collection  modules  online  by  clicking  on  the  "Start  ^"  link,  which  takes  you  to  the 
first  module  in  the  collection.  You  can  then  navigate  through  the  subsequent  modules  by  using  their 
"Next  and  "Previous  3>"  links  to  move  forward  and  backward  in  the  collection.  You  can  jump  to 
any  module  in  the  collection  by  clicking  on  that  module's  title  in  the  "Collection  Contents"  box  on  the 
left  side  of  the  window.  If  these  contents  are  hidden,  make  them  visible  by  clicking  on  "[show  table 
of  contents]". 

Accessibility  and  Section  508  Compliance 

•  For  information  on  general  Connexions  accessibility  features,  please  visit 
http://cnx.org/content/ml7212/latest/^. 

•  For  information  on  accessibility  features  specific  to  the  Collaborative  Statistics  textbook,  please  visit 
http:  /  /cnx.org/content/ml7211  /latest/^. 

Version  Change  History  and  Errata 

•  For  a  list  of  modifications,  updates,  and  corrections,  please  visit 
http:  /  /cnx.org/content/ml7360/latest/^. 

Adoption  and  Usage 

•  The  Collaborative  Statistics  collection  has  been  adopted  and  customized  by  a  number  of  profes- 
sors and  educators  for  use  in  their  classes.  For  a  list  of  known  versions  and  adopters,  please  visit 
http://cnx.org/content/ml8261 /latest/ 1°. 

About  "Collaborative  Statistics'' 

Collaborative  Statistics  was  written  by  Barbara  Illowsky  and  Susan  Dean,  faculty  members  at  De  Anza  Col- 
lege in  Cupertino,  California.  The  textbook  was  developed  over  several  years  and  has  been  used  in  regular 
and  honors-level  classroom  settings  and  in  distance  learning  classes.  Courses  using  this  textbook  have  been 
articulated  by  the  University  of  California  for  transfer  of  credit.  The  textbook  contains  full  materials  for 
course  offerings,  including  expository  text,  examples,  labs,  homework,  and  projects.  A  Teacher's  Guide  is 
currently  available  in  print  form  and  on  the  Connexions  site  at  http:/ / cnx.org/ content/ coll0547/latest/^^, 
and  supplemental  course  materials  including  additional  problem  sets  and  video  lectures  are  available  at 
http://cnx.org/content/coll0586/latest/^^.  The  on-line  text  for  each  of  these  collections  collections  will 
meet  the  Section  508  standards  for  accessibility. 

An  on-line  course  based  on  the  textbook  was  also  developed  by  Illowsky  and  Dean.  It  has  won  an  award 
as  the  best  on-line  California  community  college  course.  The  on-line  course  will  be  available  at  a  later  date 
as  a  collection  in  Connexions,  and  each  lesson  in  the  on-line  course  will  be  linked  to  the  on-line  textbook 
chapter.  The  on-line  course  will  include,  in  addition  to  expository  text  and  examples,  videos  of  coiurse 
lectures  in  captioned  and  non-captioned  format. 

The  original  preface  to  the  book  as  written  by  professors  Illowsky  and  Dean,  now  follows: 


^"Accessibility  Features  of  Connexions"  <http:// cnx.org/ content/ ml7212/latest/> 
*  "Collaborative  Statistics:  Accessibility"  <http://cnx.org/content/ml7211/latest/> 
'"Collaborative  Statistics:  Change  History"  <http://cnx.org/ content/ml7360/latest/> 
"^"Collaborative  Statistics:  Adoption  and  Usage"  <http://cnx.org/content/ml8261/latest/> 
''^Collaborative  Statistics  Teacher's  Guide  <http://cnx.org/content/coll0547/Iatest/> 
^■^Collaborative  Statistics:  Supplemental  Course  Materials  <http:/ / crtx.org/content/coll0586/latest/> 
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This  book  is  intended  for  introductory  statistics  courses  being  taken  by  students  at  two-  and  four-year 
colleges  who  are  majoring  in  fields  other  than  math  or  engineering.  Intermediate  algebra  is  the  only  pre- 
requisite. The  book  focuses  on  applications  of  statistical  knowledge  rather  than  the  theory  behind  it.  The 
text  is  named  Collaborative  Statistics  because  students  learn  best  by  doing.  In  fact,  they  learn  best  by 
working  in  small  groups.  The  old  saying  "two  heads  are  better  than  one"  truly  applies  here. 

Our  emphasis  in  this  text  is  on  four  main  concepts: 

•  thinking  statistically 

•  incorporating  technology 

•  working  collaboratively 

•  writing  thoughtfully 

These  concepts  are  integral  to  our  course.  Students  learn  the  best  by  actively  participating,  not  by  just 
watching  and  listening.  Teaching  should  be  highly  interactive.  Students  need  to  be  thoroughly  engaged 
in  the  learning  process  in  order  to  make  sense  of  statistical  concepts.  Collaborative  Statistics  provides 
techniques  for  students  to  write  across  the  curriculum,  to  collaborate  with  their  peers,  to  think  statistically, 
and  to  incorporate  technology. 

This  book  takes  students  step  by  step.  The  text  is  interactive.  Therefore,  students  can  immediately  apply 
what  they  read.  Once  students  have  completed  the  process  of  problem  solving,  they  can  tackle  interesting 
and  challenging  problems  relevant  to  today's  world.  The  problems  require  the  students  to  apply  their 
newly  foun.d  skills.  In  addition,  technology  (TI-83  graphing  calculators  are  highlighted)  is  incorporated 
throughout  the  text  and  the  problems,  as  well  as  in  the  special  group  activities  and  projects.  The  book  also 
contains  labs  that  use  real  data  and  practices  that  lead  students  step  by  step  through  the  problem  solving 
process. 

At  De  Anza,  along  with  hundreds  of  other  colleges  across  the  country,  the  college  audience  involves  a 
large  number  of  ESL  students  as  well  as  students  from  many  disciplines.  The  ESL  students,  as  well  as 
the  non-ESL  students,  have  been  especially  appreciative  of  this  text.  They  find  it  extremely  readable  and 
understandable.  Collaborative  Statistics  has  been  used  in  classes  that  range  from  20  to  120  students,  and  in 
regular,  honor,  and  distance  learning  classes. 

Susan  Dean 

Barbara  lUowsky 
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Additional  Resources  Currently  Available 

•  Glossary  (Glossary,  p.  5) 

•  View  or  Download  This  Textbook  Online  (View  or  Download  This  Textbook  Online,  p.  5) 

•  Collaborative  Statistics  Teacher's  Guide  (Collaborative  Statistics  Teacher's  Guide,  p.  6) 

•  Supplemental  Materials  (Supplemental  Materials,  p.  6) 

•  Video  Lectures  (Video  Lectures,  p.  6) 

•  Version  History  (Version  History,  p.  6) 

•  Textbook  Adoption  and  Usage  (Textbook  Adoption  and  Usage,  p.  6) 

•  Additional  Technologies  and  Notes  (Additional  Technologies,  p.  7) 

•  Accessibility  and  Section  508  Compliance  (Accessibility  and  Section  508  Compliance,  p.  7) 

The  following  section  describes  some  additional  resources  for  learners  and  educators.  These  modules  and 
collections  are  all  available  on  the  Connexions  website  (http://cnx.org/^^  )  and  can  be  viewed  online, 
downloaded,  printed,  or  ordered  as  appropriate. 

Glossary 

This  module  contains  the  entire  glossary  for  the  Collaborative  Statistics  textbook  collection  (coll0522) 
since  its  initial  release  on  15  July  2008.  The  glossary  is  located  at  http:/ / cnx.org/ content/ ml6129/latest/^^. 

Below  are  links  to  additional  resources: 

Link  to  the  Statistics  Glossary  by  Dr.  Philip  Stark,  UC  Berkeley 
http:/ /  statistics.berkeley.edu/ ^stark/ SticiGui/Text/gloss.htm^^ 
Link  to  Wikipedia 

http:/ /  http:/ / www.wikipedia.org/^^ 

(Search  on  "Glossary  of  probability  and  statistics.") 
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Student  Welcome  Letter"' 


Dear  Student: 

Have  you  heard  others  say,  "You're  taking  statistics?  That's  the  hardest  course  I  ever  took!"  They  say  that, 
because  they  probably  spent  the  entire  course  confused  and  struggling.  They  were  probably  lectured  to 
and  never  had  the  chance  to  experience  the  subject.  You  will  not  have  that  problem.  Let's  find  out  why. 

There  is  a  Chinese  Proverb  that  describes  our  feelings  about  the  field  of  statistics: 

I  HEAR,  AND  I  FORGET 

I  SEE,  AND  I  REMEMBER 

I  DO,  AND  I  UNDERSTAND 

Statistics  is  a  "do"  field.  In  order  to  learn  it,  you  must  "do"  it.  We  have  structured  this  book  so  that  you  will 
have  hands-on  experiences.  They  will  enable  you  to  truly  understand  the  concepts  instead  of  merely  going 
through  the  requirements  for  the  course. 

What  makes  this  book  different  from  other  texts?  First,  we  have  eliminated  the  drudgery  of  tedious  cal- 
culations. You  might  be  using  computers  or  graphing  calculators  so  that  you  do  not  need  to  struggle  with 
algebraic  manipulations.  Second,  this  course  is  taught  as  a  collaborative  activity.  With  others  in  yoiur  class, 
you  wiU  work  toward  the  common  goal  of  learning  this  material. 

Here  are  some  hints  for  success  in  your  class: 

•  Work  hard  and  work  every  night. 

•  Form  a  study  group  and  learn  together. 

•  Don't  get  discouraged  -  you  can  do  it! 

•  As  you  solve  problems,  ask  yourself,  "Does  this  answer  make  sense?" 

•  Many  statistics  words  have  the  same  meaning  as  in  everyday  English. 

•  Go  to  your  teacher  for  help  as  soon  as  you  need  it. 

•  Don't  get  behind. 

•  Read  the  newspaper  and  ask  yourself,  "Does  this  article  make  sense?" 

•  Draw  pictures  -  they  truly  help! 

Good  luck  and  don't  give  up! 
Sincerely, 

Susan  Dean  and  Barbara  lUowsky 


This  content  is  available  online  at  <http://cnx.org/content/ml6305/1.5/>. 
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Chapter  1 

Sampling  and  Data 


1.1  Sampling  and  Data^ 

1.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Recognize  and  differentiate  between  key  terms. 

•  Apply  various  types  of  sampling  methods  to  data  collection. 

•  Create  and  interpret  frequency  tables. 

1.1.2  Introduction 

You  are  probably  asking  yourself  the  question,  "When  and  where  will  I  use  statistics?".  If  you  read  any 
newspaper  or  watch  television,  or  use  the  Internet,  you  will  see  statistical  information.  There  are  statistics 
about  crime,  sports,  education,  politics,  and  real  estate.  Typically,  when  you  read  a  newspaper  article  or 
watch  a  news  program  on  television,  you  are  given  sample  information.  With  this  information,  you  may 
make  a  decision  about  the  correctness  of  a  statement,  claim,  or  "fact."  Statistical  methods  can  help  you  make 
the  "best  educated  guess." 

Since  you  will  undoubtedly  be  given  statistical  information  at  some  point  in  your  life,  you  need  to  know 
some  techniques  to  analyze  the  information  thoughtfully.  Think  about  buying  a  house  or  managing  a 
budget.  Think  about  your  chosen  profession.  The  fields  of  economics,  business,  psychology,  education, 
biology,  law,  computer  science,  police  science,  and  early  childhood  development  require  at  least  one  course 
in  statistics. 

Included  in  this  chapter  are  the  basic  ideas  and  words  of  probability  and  statistics.  You  will  soon  under- 
stand that  statistics  and  probability  work  together.  You  will  also  learn  how  data  are  gathered  and  what 
"good"  data  are. 

1.2  Statistics' 

The  science  of  statistics  deals  with  the  collection,  analysis,  interpretation,  and  presentation  of  data.  We  see 
and  use  data  in  our  everyday  lives. 

^This  content  is  available  onKne  at  <http:/ / cnx.org/content/ ml6008/1.9/ >. 
^This  content  is  available  online  at  <http://cnx.org/content/ml6020/1.16/>. 
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CHAPTER  1.  SAMPLING  AND  DATA 


1.2.1  Optional  Collaborative  Classroom  Exercise 

In  your  classroom,  try  this  exercise.  Have  class  members  write  down  the  average  time  (in  hours,  to  the 
nearest  half-hour)  they  sleep  per  night.  Your  instructor  will  record  the  data.  Then  create  a  simple  graph 
(called  a  dot  plot)  of  the  data.  A  dot  plot  consists  of  a  number  line  and  dots  (or  points)  positioned  above 
the  number  line.  For  example,  consider  the  following  data: 

5;  5.5;  6;  6;  6;  6.5;  6.5;  6.5;  6.5;  7;  7;  8;  8;  9 

The  dot  plot  for  this  data  would  be  as  follows: 

Frequency  of  Average  Time  (in  Hours)  Spent  Sleeping  per  Night 

o 

o  o 

o     o     o  o 
o     0     o     o     o  o  o 


Figure  1.1 


Does  your  dot  plot  look  the  same  as  or  different  from  the  example?  Why?  If  you  did  the  same  example  in 
an  English  class  with  the  same  number  of  students,  do  you  think  the  results  would  be  the  same?  Why  or 
why  not? 

Where  do  your  data  appear  to  cluster?  How  could  you  interpret  the  clustering? 

The  questions  above  ask  you  to  analyze  and  interpret  your  data.  With  this  example,  you  have  begun  your 
study  of  statistics. 

In  this  course,  you  will  learn  how  to  organize  and  summarize  data.  Organizing  and  summarizing  data  is 
called  descriptive  statistics.  Two  ways  to  summarize  data  are  by  graphing  and  by  numbers  (for  example, 
finding  an  average).  After  you  have  studied  probability  and  probability  distributions,  you  will  use  formal 
methods  for  drawing  conclusions  from  "good"  data.  The  formal  methods  are  called  inferential  statistics. 
Statistical  inference  uses  probability  to  determine  how  confident  we  can  be  that  the  conclusions  are  correct. 

Effective  interpretation  of  data  (inference)  is  based  on  good  procedures  for  producing  data  and  thoughtful 
examination  of  the  data.  You  will  encounter  what  will  seem  to  be  too  many  mathematical  formulas  for 
interpreting  data.  The  goal  of  statistics  is  not  to  perform  numerous  calculations  using  the  formulas,  but  to 
gain  an  understanding  of  your  data.  The  calculations  can  be  done  using  a  calculator  or  a  computer.  The 
understanding  must  come  from  you.  If  you  can  thoroughly  grasp  the  basics  of  statistics,  you  can  be  more 
confident  in  the  decisions  you  make  in  life. 
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1.2.2  Levels  of  Measurement  and  Statistical  Operations 

The  way  a  set  of  data  is  measured  is  called  its  level  of  measiurement.  Correct  statistical  procedures  depend 
on  a  researcher  being  familiar  with  levels  of  measurement.  Not  every  statistical  operation  can  be  used  with 
every  set  of  data.  Data  can  be  classified  into  four  levels  of  measurement.  They  are  (from  lowest  to  highest 
level): 

•  Nominal  scale  level 

•  Ordinal  scale  level 

•  Interval  scale  level 

•  Ratio  scale  level 

Data  that  is  measured  using  a  nominal  scale  is  qualitative.  Categories,  colors,  names,  labels  and  favorite 
foods  along  with  yes  or  no  responses  are  examples  of  nominal  level  data.  Nominal  scale  data  are  not 
ordered.  For  example,  trying  to  classify  people  according  to  their  favorite  food  does  not  make  any  sense. 
Putting  pizza  first  and  sushi  second  is  not  meaningful. 

Smartphone  companies  are  another  example  of  nominal  scale  data.  Some  examples  are  Sony,  Mo- 
torola, Nokia,  Samsung  and  Apple.  This  is  just  a  list  and  there  is  no  agreed  upon  order.  Some  people  may 
favor  Apple  but  that  is  a  matter  of  opinion.  Nominal  scale  data  cannot  be  used  in  calculations. 

Data  that  is  measured  using  an  ordinal  scale  is  similar  to  nominal  scale  data  but  there  is  a  big  dif- 
ference. The  ordinal  scale  data  can  be  ordered.  An  example  of  ordinal  scale  data  is  a  list  of  the  top  five 
national  parks  in  the  United  States.  The  top  five  national  parks  in  the  United  States  can  be  ranked  from  one 
to  five  but  we  cannot  measiire  differences  between  the  data. 

Another  example  using  the  ordinal  scale  is  a  cruise  survey  where  the  responses  to  questions  about 
the  cruise  are  "excellent,"  "good,"  "satisfactory"  and  "unsatisfactory."  These  responses  are  ordered  from 
the  most  desired  response  by  the  cruise  lines  to  the  least  desired.  But  the  differences  between  two  pieces  of 
data  cannot  be  measured.  Like  the  nominal  scale  data,  ordinal  scale  data  cannot  be  used  in  calciilations. 

Data  that  is  measured  using  the  interval  scale  is  similar  to  ordinal  level  data  because  it  has  a  defi- 
nite ordering  but  there  is  a  difference  between  data.  The  differences  between  interval  scale  data  can  be 
measured  though  the  data  does  not  have  a  starting  point. 

Temperature  scales  like  Celsius  (C)  and  Fahrenheit  (F)  are  measured  by  using  the  interval  scale.  In 
both  temperatiire  measiirements,  40  degrees  is  equal  to  100  degrees  minus  60  degrees.  Differences 
make  sense.  But  0  degrees  does  not  because,  in  both  scales,  0  is  not  the  absolute  lowest  temperature. 
Temperatures  like  -10°  F  and  -15°  C  exist  and  are  colder  than  0. 

Interval  level  data  can  be  used  in  calculations  but  one  t5^e  of  comparison  cannot  be  done.  Eighty 
degrees  C  is  not  4  times  as  hot  as  20°  C  (nor  is  80°  F  4  times  as  hot  as  20°  F).  There  is  no  meaning  to  the 
ratio  of  80  to  20  (or  4  to  1). 

Data  that  is  measured  using  the  ratio  scale  takes  care  of  the  ratio  problem  and  gives  you  the  most 
information.  Ratio  scale  data  is  like  interval  scale  data  but,  in  addition,  it  has  a  0  point  and  ratios  can  be 
calculated.  For  example,  four  multiple  choice  statistics  final  exam  scores  are  80,  68,  20  and  92  (out  of  a 
possible  100  points).  The  exams  were  machine-graded. 

The  data  can  be  put  in  order  from  lowest  to  highest:  20,  68,  80,  92. 

The  differences  between  the  data  have  meaning.  The  score  92  is  more  than  the  score  68  by  24  points. 
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Ratios  can  be  calculated.  The  smallest  score  for  ratio  data  is  0.  So  80  is  4  times  20.  The  score  of  80 
is  4  times  better  than  the  score  of  20. 

Exercises 

What  type  of  measure  scale  is  being  used?  Nominal,  Ordinal,  Interval  or  Ratio. 

1.  High  school  men  soccer  players  classified  by  their  athletic  ability:  Superior,  Average,  Above  average. 

2.  Baking  temperatujres  for  various  main  dishes:  350, 400,  325, 250,  300 

3.  The  colors  of  crayons  in  a  24-crayon  box. 

4.  Social  security  numbers. 

5.  Incomes  measured  in  dollars 

6.  A  satisfaction  survey  of  a  social  website  by  number:  1  =  very  satisfied,  2  =  somewhat  satisfied,  3  =  not 

satisfied. 

7.  Political  outlook:  extreme  left,  left-of-center,  right-of -center,  extreme  right. 

8.  Time  of  day  on  an  analog  watch. 

9.  The  distance  in  miles  to  the  closest  grocery  store. 

10.  The  dates  1066, 1492, 1644, 1947, 1944. 

11.  The  heights  of  21  -  65  year-old  women. 

12.  Common  letter  grades  A,  B,  C,  D,  F. 

Answers  1.  ordinal,  2.  interval,  3.  nominal,  4.  nominal,  5.  ratio,  6.  ordinal,  7.  nominal,  8.  interval,  9.  ratio, 
10.  interval,  11.  ratio,  12.  ordinal 

1.3  Probability' 

Probability  is  a  mathematical  tool  used  to  study  randomness.  It  deals  with  the  chance  (the  likelihood)  of 
an  event  occurring.  For  example,  if  you  toss  a  fair  coin  4  times,  the  outcomes  may  not  be  2  heads  and  2 
tails.  However,  if  you  toss  the  same  coin  4,000  times,  the  outcomes  will  be  close  to  half  heads  and  half  tails. 
The  expected  theoretical  probability  of  heads  in  any  one  toss  is  ^  or  0.5.  Even  though  the  outcomes  of  a 
few  repetitions  are  uncertain,  there  is  a  regular  pattern  of  outcomes  when  there  are  many  repetitions.  After 
reading  about  the  English  statistician  Karl  Pearson  who  tossed  a  coin  24,000  times  with  a  result  of  12,012 
heads,  one  of  the  authors  tossed  a  coin  2,000  times.  The  results  were  996  heads.  The  fraction  is  equal 
to  0.498  which  is  very  close  to  0.5,  the  expected  probability. 

The  theory  of  probability  began  with  the  study  of  games  of  chance  such  as  poker.  Predictions  take  the  form 
of  probabilities.  To  predict  the  likelihood  of  an  earthquake,  of  rain,  or  whether  you  will  get  an  A  in  this 
course,  we  use  probabilities.  Doctors  use  probability  to  determine  the  chance  of  a  vaccination  causing  the 
disease  the  vaccination  is  supposed  to  prevent.  A  stockbroker  uses  probability  to  determine  the  rate  of 
return  on  a  client's  investments.  You  might  use  probability  to  decide  to  buy  a  lottery  ticket  or  not.  In  your 
study  of  statistics,  you  will  use  the  power  of  mathematics  through  probability  calciilations  to  analyze  and 
interpret  your  data. 

1.4  Key  Terms* 

In  statistics,  we  generally  want  to  study  a  population.  You  can  think  of  a  population  as  an  entire  collection 
of  persons,  things,  or  objects  under  study.  To  study  the  larger  population,  we  select  a  sample.  The  idea  of 
sampling  is  to  select  a  portion  (or  subset)  of  the  larger  population  and  study  that  portion  (the  sample)  to 
gain  information  about  the  popiilation.  Data  are  the  result  of  sampling  from  a  population. 

■^This  content  is  available  online  at  <http://cnx.Org/content/ml6015/l.ll/>. 
''This  content  is  available  onUne  at  <http://cnx.org/content/ml6007/1.17/>. 
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Because  it  takes  a  lot  of  time  and  money  to  examine  an  entire  population,  sampling  is  a  very  practical 
technique.  If  you  wished  to  compute  the  overall  grade  point  average  at  your  school,  it  would  make  sense 
to  select  a  sample  of  students  who  attend  the  school.  The  data  collected  from  the  sample  would  be  the 
students'  grade  point  averages.  In  presidential  elections,  opinion  poll  samples  of  1,000  to  2,000  people  are 
taken.  The  opinion  poU  is  supposed  to  represent  the  views  of  the  people  in  the  entire  country.  Manu- 
facturers of  canned  carbonated  drinks  take  samples  to  determine  if  a  16  oimce  can  contains  16  ounces  of 
carbonated  drink. 

From  the  sample  data,  we  can  calculate  a  statistic.  A  statistic  is  a  number  that  is  a  property  of  the  sample. 
For  example,  if  we  consider  one  math  class  to  be  a  sample  of  the  population  of  all  math  classes,  then  the 
average  number  of  points  earned  by  students  in  that  one  math  class  at  the  end  of  the  term  is  an  example  of 
a  statistic.  The  statistic  is  an  estimate  of  a  population  parameter  A  parameter  is  a  number  that  is  a  property 
of  the  population.  Since  we  considered  all  math  classes  to  be  the  population,  then  the  average  number  of 
points  earned  per  student  over  all  the  math  classes  is  an  example  of  a  parameter. 

One  of  the  main  concerns  in  the  field  of  statistics  is  how  accurately  a  statistic  estimates  a  parameter.  The 
accuracy  really  depends  on  how  well  the  sample  represents  the  population.  The  sample  must  contain  the 
characteristics  of  the  population  in  order  to  be  a  representative  sample.  We  are  interested  in  both  the 
sample  statistic  and  the  population  parameter  in  inferential  statistics.  In  a  later  chapter,  we  wiU  use  the 
sample  statistic  to  test  the  validity  of  the  established  popiilation  parameter. 

A  variable,  notated  by  capital  letters  like  X  and  Y,  is  a  characteristic  of  interest  for  each  person  or  thing  in 
a  population.  Variables  may  be  numerical  or  categorical.  Numerical  variables  take  on  values  with  equal 
units  such  as  weight  in  pounds  and  time  in  hours.  Categorical  variables  place  the  person  or  thing  into  a 
category.  If  we  let  X  equal  the  number  of  points  earned  by  one  math  student  at  the  end  of  a  term,  then  X 
is  a  numerical  variable.  If  we  let  Y  be  a  person's  party  affiliation,  then  examples  of  Y  include  Republican, 
Democrat,  and  Independent.  Y  is  a  categorical  variable.  We  could  do  some  math  with  values  of  X  (calculate 
the  average  number  of  points  earned,  for  example),  but  it  makes  no  sense  to  do  math  with  values  of  Y 
(calciilating  an  average  party  affiliation  makes  no  sense). 

Data  are  the  actual  values  of  the  variable.  They  may  be  niraibers  or  they  may  be  words.  Datirai  is  a  single 
value. 

Two  words  that  come  up  often  in  statistics  are  mean  and  proportion.  If  you  were  to  take  three  exams  in 
your  math  classes  and  obtained  scores  of  86,  75,  and  92,  you  calculate  your  mean  score  by  adding  the  three 
exam  scores  and  dividing  by  three  (your  mean  score  would  be  84.3  to  one  decimal  place).  If,  in  your  math 
class,  there  are  40  students  and  22  are  men  and  18  are  women,  then  the  proportion  of  men  students  is  ^ 
and  the  proportion  of  women  students  is  Mean  and  proportion  are  discussed  in  more  detail  in  later 
chapters. 

NOTE:  The  words  "mean"  and  "average"  are  often  used  interchangeably.  The  substitution  of  one 
word  for  the  other  is  common  practice.  The  technical  term  is  "arithmetic  mean"  and  "average"  is 
technically  a  center  location.  However,  in  practice  among  non-statisticians,  "average"  is  commonly 
accepted  for  "arithmetic  mean." 

Example  1.1 

Define  the  key  terms  from  the  following  study:  We  want  to  know  the  average  (mean)  amount 

of  money  first  year  college  students  spend  at  ABC  College  on  school  supplies  that  do  not  include 
books.  We  randomly  survey  100  first  year  students  at  the  college.  Three  of  those  students  spent 
$150,  $200,  and  $225,  respectively 

Solution 

The  population  is  all  first  year  students  attending  ABC  College  this  term. 
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The  sample  could  be  all  students  enrolled  in  one  section  of  a  beginning  statistics  course  at  ABC 
College  (although  this  sample  may  not  represent  the  entire  population). 

The  parameter  is  the  average  (mean)  amount  of  money  spent  (excluding  books)  by  first  year  col- 
lege students  at  ABC  CoUege  this  term. 

The  statistic  is  the  average  (mean)  amoimt  of  money  spent  (excluding  books)  by  first  year  college 
students  in  the  sample. 

The  variable  could  be  the  amount  of  money  spent  (excluding  books)  by  one  first  year  student. 
Let  X  =  the  amount  of  money  spent  (excluding  books)  by  one  first  year  student  attending  ABC 

College. 

The  data  are  the  dollar  amounts  spent  by  the  first  year  students.  Examples  of  the  data  are  $150, 
$200,  and  $225. 


1.4.1  Optional  Collaborative  Classroom  Exercise 

Do  the  following  exercise  collaboratively  with  up  to  four  people  per  group.  Find  a  population,  a  sample, 
the  parameter,  the  statistic,  a  variable,  and  data  for  the  following  study:  You  want  to  determine  the  average 
(mean)  number  of  glasses  of  milk  college  students  drink  per  day.  Suppose  yesterday,  in  your  English  class, 
you  asked  five  students  how  many  glasses  of  milk  they  drank  the  day  before.  The  answers  were  1,  0, 1,  3, 
and  4  glasses  of  milk. 

1.5  Data' 

Data  may  come  from  a  population  or  from  a  sample.  Small  letters  like  x  or  y  generally  are  used  to  represent 
data  values.  Most  data  can  be  put  into  the  following  categories: 

•  Qualitative 

•  Quantitative 

Qualitative  data  are  the  resiilt  of  categorizing  or  describing  attributes  of  a  popiilation.  Hair  color,  blood 
t5^e,  ethnic  group,  the  car  a  person  drives,  and  the  street  a  person  lives  on  are  examples  of  qualitative  data. 
Qualitative  data  are  generally  described  by  words  or  letters.  For  instance,  hair  color  might  be  black,  dark 
brown,  light  brown,  blonde,  gray,  or  red.  Blood  t5^e  might  be  AB+,  0-,  or  B+.  Researchers  often  prefer  to 
use  quantitative  data  over  qualitative  data  because  it  lends  itself  more  easily  to  mathematical  analysis.  For 
example,  it  does  not  make  sense  to  find  an  average  hair  color  or  blood  type. 

Quantitative  data  are  always  numbers.  Quantitative  data  are  the  result  of  counting  or  measuring  attributes 
of  a  population.  Amount  of  money,  pulse  rate,  weight,  number  of  people  living  in  yoiir  town,  and  the 
number  of  students  who  take  statistics  are  examples  of  quantitative  data.  Quantitative  data  may  be  either 
discrete  or  continuous. 

All  data  that  are  the  result  of  counting  are  called  quantitative  discrete  data.  These  data  take  on  only  certain 
numerical  values.  If  you  count  the  number  of  phone  calls  you  receive  for  each  day  of  the  week,  you  might 
get  0, 1, 2, 3,  etc. 

^This  content  is  available  online  at  <http:/ / caix.org/ content/ ml6005/1.18/>. 
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All  data  that  are  the  result  of  measuring  are  quantitative  continuous  data  assuming  that  we  can  measure 
accurately.  Measuring  angles  in  radians  might  result  in  the  niimbers  f  /  f  /f  /  tt  ,  ^  ,  etc.  If  you  and  your 
friends  carry  backpacks  with  books  in  them  to  school,  the  numbers  of  books  in  the  backpacks  are  discrete 
data  and  the  weights  of  the  backpacks  are  continuous  data. 

NOTE:  In  this  course,  the  data  used  is  mainly  quantitative.  It  is  easy  to  calculate  statistics  (like  the 
mean  or  proportion)  from  numbers.  In  the  chapter  Descriptive  Statistics,  you  will  be  introduced 
to  stem  plots,  histograms  and  box  plots  all  of  which  display  quantitative  data.  Qualitative  data  is 
discussed  at  the  end  of  this  section  through  graphs. 

Example  1.2:  Data  Sample  of  Quantitative  Discrete  Data 

The  data  are  the  number  of  books  students  carry  in  their  backpacks.  You  sample  five  students. 
Two  students  carry  3  books,  one  student  carries  4  books,  one  student  carries  2  books,  and  one 
student  carries  1  book.  The  numbers  of  books  (3, 4, 2,  and  1)  are  the  quantitative  discrete  data. 

Example  1.3:  Data  Sample  of  Quantitative  Continuous  Data 

The  data  are  the  weights  of  the  backpacks  with  the  books  in  it.  You  sample  the  same  five  students. 
The  weights  (in  pounds)  of  their  backpacks  are  6.2,  7,  6.8, 9.1, 4.3.  Notice  that  backpacks  carrying 
three  books  can  have  different  weights.  Weights  are  quantitative  continuous  data  because  weights 

are  measured. 

Example  1.4:  Data  Sample  of  Qualitative  Data 

The  data  are  the  colors  of  backpacks.  Again,  you  sample  the  same  five  students.  One  student  has 

a  red  backpack,  two  students  have  black  backpacks,  one  student  has  a  green  backpack,  and  one 
student  has  a  gray  backpack.  The  colors  red,  black,  black,  green,  and  gray  are  qualitative  data. 

NOTE:  You  may  collect  data  as  numbers  and  report  it  categorically.  For  example,  the  quiz  scores 
for  each  student  are  recorded  throughout  the  term.  At  the  end  of  the  term,  the  quiz  scores  are 
reported  as  A,  B,  C,  D,  or  F. 

Example  1.5 

Work  collaboratively  to  determine  the  correct  data  type  (quantitative  or  qualitative).  Indicate 
whether  quantitative  data  are  continuous  or  discrete.  Hint:  Data  that  are  discrete  often  start  with 
the  words  "the  number  of." 

1.  The  number  of  pairs  of  shoes  you  own. 

2.  The  iype  of  car  you  drive. 

3.  Where  you  go  on  vacation. 

4.  The  distance  it  is  from  your  home  to  the  nearest  grocery  store. 

5.  The  number  of  classes  you  take  per  school  year. 

6.  The  tuition  for  your  classes 

7.  The  type  of  calciilator  you  use. 

8.  Movie  ratings. 

9.  Political  party  preferences. 

10.  Weight  of  sumo  wrestlers. 

11.  Amount  of  money  won  playing  poker. 

12.  Number  of  correct  answers  on  a  quiz. 

13.  Peoples'  attitudes  toward  the  government. 

14.  IQ  scores.  (This  may  cause  some  discussion.) 
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Qualitative  Data  Discussion 

Below  are  tables  of  part-time  vs  full-time  students  at  De  Anza  College  in  Cupertino,  CA  and  Foothill  Col- 
lege in  Los  Altos,  CA  for  the  Spring  2010  quarter.  The  tables  display  counts  (frequencies)  and  percentages 
or  proportions  (relative  frequencies).  The  percent  columns  make  comparing  the  same  categories  in  the  col- 
leges easier.  Displaying  percentages  along  with  the  numbers  is  often  helpful,  but  it  is  particularly  important 
when  comparing  sets  of  data  that  do  not  have  the  same  totals,  such  as  the  total  enrollments  for  both  col- 
leges in  this  example.  Notice  how  much  larger  the  percentage  for  part-time  students  at  Foothill  College  is 
compared  to  De  Anza  College. 

De  Anza  College 


Number 

Percent 

Full-time 

9,200 

40.9% 

Part-time 

13,296 

59.1% 

Total 

22,496 

100% 

Table  1.1 
Foothill  College 

Niraiber 

Percent 

Full-time 

4,059 

28.6% 

Part-time 

10,124 

71.4% 

Total 

14,183 

100% 

Table  1.2 


Tables  are  a  good  way  of  organizing  and  displaying  data.  But  graphs  can  be  even  more  helpful  in 
understanding  the  data.  There  are  no  strict  rules  concerning  what  graphs  to  use.  Below  are  pie  charts  and 
bar  graphs,  two  graphs  that  are  used  to  display  qualitative  data. 

In  a  pie  chart,  categories  of  data  are  represented  by  wedges  in  the  circle  and  are  proportional  in  size 
to  the  percent  of  individuals  in  each  category. 

In  a  bar  graph,  the  length  of  the  bar  for  each  category  is  proportional  to  the  number  or  percent  of 
individuals  in  each  category.  Bars  may  be  vertical  or  horizontal. 

A  Pareto  chart  consists  of  bars  that  are  sorted  into  order  by  category  size  (largest  to  smallest). 

Look  at  the  graphs  and  determine  which  graph  (pie  or  bar)  you  think  displays  the  comparisons  bet- 
ter. This  is  a  matter  of  preference. 

It  is  a  good  idea  to  look  at  a  variety  of  graphs  to  see  which  is  the  most  helpful  in  displaying  the 
data.  We  might  make  different  choices  of  what  we  think  is  the  "best"  graph  depending  on  the  data  and  the 
context.  Our  choice  also  depends  on  what  we  are  using  the  data  for. 
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De  Anza  College 

Foothill  College 

■  Full  Time  □  Part  Time 

B  Full  Time  0  Part  Time 

Part  Timet 
59.1% 

Full  Time 

Full  Time 
j^fc.  28.6% 

Part  Time 
71 .4% 

Table  1.3 


Student  Status 


■  Full  Time 
QPat  TlrTE 


De  Anza 


Foothill 


Table  1.4 


Percentages  That  Add  to  More  (or  Less)  Than  100% 

Sometimes  percentages  add  up  to  be  more  than  100%  (or  less  than  100%).  In  the  graph,  the  percentages 
add  to  more  than  100%  because  students  can  be  in  more  than  one  category.  A  bar  graph  is  appropriate 
to  compare  the  relative  size  of  the  categories.  A  pie  chart  cannot  be  used.  It  also  could  not  be  used  if  the 
percentages  added  to  less  than  100%. 

De  Anza  College  Spring  2010 


Characteristic  /  Category 

Percent 

Full-time  Students 

40.9% 

Students  who  intend  to  transfer  to  a  4-year  educational  institution 

48.6% 

Students  under  age  25 

61.0% 

TOTAL 

150.5% 

Table  1.5 
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Underage  25 

kitend  to 
Transfer 

Full-Time 
All  Students 


0% 


61,0% 


40.91% 


20% 


40% 


60% 


Table  1.6 


10  0,0=i 


so% 


100% 


Omitting  Categories/Missing  Data 

The  table  displays  Ethnicity  of  Students  but  is  missing  the  "Other /Unknown"  category.  This  category  con- 
tains people  who  did  not  feel  they  fit  into  any  of  the  ethnicity  categories  or  declined  to  respond.  Notice  that 
the  frequencies  do  not  add  up  to  the  total  number  of  students.  Create  a  bar  graph  and  not  a  pie  chart. 


Missing  Data:  Ethnicity  of  Students  De  Anza  College  Fall  Term  2007  (Census  Day) 


Frequency 

Percent 

Asian 

8,794 

36.1% 

Black 

1,412 

5.8% 

Filipino 

1,298 

5.3% 

Hispanic 

4,180 

17.1% 

Native  American 

146 

0.6% 

Pacific  Islander 

236 

1.0% 

White 

5,978 

24.5% 

TOTAL 

22,044  out  of  24,382 

90.4%  out  of  100% 

Table  1.7 
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Bar  graph  Without  Other/Unknown  Category 
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Table  1.8 

The  following  graph  is  the  same  as  the  previous  graph  but  the  "Other /Unknown"  percent  (9.6%)  has  been 
added  back  in.  The  "Other /Unknown"  category  is  large  compared  to  some  of  the  other  categories  (Native 
American,  0.6%,  Pacific  Islander  1.0%  particularly).  This  is  important  to  know  when  we  think  about  what 
the  data  are  telling  us. 

This  particular  bar  graph  can  be  hard  to  understand  visually.  The  graph  below  it  is  a  Pareto  chart. 
The  Pareto  chart  has  the  bars  sorted  from  largest  to  smallest  and  is  easier  to  read  and  interpret. 

Bar  Graph  With  Other/Unknown  Category 
Eihn  icily  af  Stj  d  erils 
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Table  1.9 
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Pareto  Chart  With  Bars  Sorted  By  Size 
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Table  1.10 

Pie  Charts:  No  Missing  Data 

The  following  pie  charts  have  the  "Other /Unknown"  category  added  back  in  (since  the  percentages  must 
add  to  100%).  The  chart  on  the  right  is  organized  having  the  wedges  by  size  and  makes  for  a  more  visually 
informative  graph  than  the  unsorted,  alphabetical  graph  on  the  left. 
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IHb  panic 
17.1% 


□  Asian 

■  White 

□  His  panic 

□  Otha 

■  Bladr 

■  Fi  lipi  no 

■  Pacific  Islander 

□  f-iati\e  American 


Table  1.11 


1.6  Sampling*' 

Gathering  information  about  an  entire  population  often  costs  too  much  or  is  virtually  impossible.  Instead, 
we  use  a  sample  of  the  population.  A  sample  should  have  the  same  characteristics  as  the  population  it 
is  representing.  Most  statisticians  use  various  methods  of  random  sampling  in  an  attempt  to  achieve  this 
goal.  This  section  will  describe  a  few  of  the  most  common  methods. 

There  are  several  different  methods  of  random  sampling.  In  each  form  of  random  sampling,  each  member 
of  a  population  initially  has  an  equal  chance  of  being  selected  for  the  sample.  Each  method  has  pros  and 
cons.  The  easiest  method  to  describe  is  called  a  simple  random  sample.  Any  group  of  n  individuals  is 

*This  content  is  available  online  at  <http://cnx.org/content/ml6014/1.17/>. 
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equally  likely  to  be  chosen  by  any  other  group  of  n  individuals  if  the  simple  random  sampling  technique  is 
used.  In  other  words,  each  sample  of  the  same  size  has  an  equal  chance  of  being  selected.  For  example,  sup- 
pose Lisa  wants  to  form  a  four-person  study  group  (herself  and  three  other  people)  from  her  pre-calculus 
class,  which  has  31  members  not  including  Lisa.  To  choose  a  simple  random  sample  of  size  3  from  the  other 
members  of  her  class,  Lisa  could  put  all  31  names  in  a  hat,  shake  the  hat,  close  her  eyes,  and  pick  out  3 
names.  A  more  technological  way  is  for  Lisa  to  first  list  the  last  names  of  the  members  of  her  class  together 
with  a  two-digit  number  as  shown  below. 
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Class  Roster 


ID 

Name 

00 

Anselmo 

01 

Bautista 

02 

Bayani 

03 

Cheng 

04 

Cuarismo 

05 

Cuningham 

06 

Fontecha 

07 

Hong 

08 

Hoobler 

09 

Jiao 

10 

Khan 

11 

King 

12 

Legeny 

13 

Lundquist 

14 

Macierz 

15 

Motogawa 

16 

Okimoto 

17 

Patel 

18 

Price 

19 

Quizon 

20 

Reyes 

21 

Roquero 

22 

Roth 

23 

Rowell 

24 

Salangsang 

25 

Slade 

26 

Stracher 

27 

TaUai 

28 

Tran 

29 

Wai 

30 

Wood 

Table  1.12 

Lisa  can  either  use  a  table  of  random  numbers  (found  in  many  statistics  books  as  well  as  mathematical 
handbooks)  or  a  calculator  or  computer  to  generate  random  numbers.  For  this  example,  suppose  Lisa 
chooses  to  generate  random  numbers  from  a  calculator.  The  numbers  generated  are: 
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.94360;  .99832;  .14669;  .51470;  .40581;  .73381;  .04399 

Lisa  reads  two-digit  groups  until  she  has  chosen  three  class  members  (that  is,  she  reads  .94360  as  the  groups 
94,  43,  36,  60).  Each  random  number  may  only  contribute  one  class  member.  If  she  needed  to,  Lisa  could 
have  generated  more  random  numbers. 

The  random  numbers  .94360  and  .99832  do  not  contain  appropriate  two  digit  numbers.  However  the  third 
random  number,  .14669,  contains  14  (the  fourth  random  number  also  contains  14),  the  fifth  random  number 
contains  05,  and  the  seventh  random  number  contains  04.  The  two-digit  number  14  corresponds  to  Macierz, 
05  corresponds  to  Cuimingham,  and  04  corresponds  to  Cuarismo.  Besides  herself,  Lisa's  group  wiU  consist 
of  Marcierz,  and  Cunningham,  and  Cuarismo. 

Besides  simple  random  sampling,  there  are  other  forms  of  sampling  that  involve  a  chance  process  for  get- 
ting the  sample.  Other  well-known  random  sampling  methods  are  the  stratified  sample,  the  cluster 
sample,  and  the  systematic  sample. 

To  choose  a  stratified  sample,  divide  the  population  into  groups  called  strata  and  then  take  a  proportionate 
number  from  each  stratum.  For  example,  you  could  stratify  (group)  your  college  population  by  department 
and  then  choose  a  proportionate  simple  random  sample  from  each  stratum  (each  department)  to  get  a  strat- 
ified random  sample.  To  choose  a  simple  random  sample  from  each  department,  number  each  member  of 
the  first  department,  number  each  member  of  the  second  department  and  do  the  same  for  the  remaining  de- 
partments. Then  use  simple  random  sampling  to  choose  proportionate  numbers  from  the  first  department 
and  do  the  same  for  each  of  the  remaining  departments.  Those  numbers  picked  from  the  first  department, 
picked  from  the  second  department  and  so  on  represent  the  members  who  make  up  the  stratified  sample. 

To  choose  a  cluster  sample,  divide  the  population  into  clusters  (groups)  and  then  randomly  select  some  of 
the  clusters.  All  the  members  from  these  dusters  are  in  the  cluster  sample.  For  example,  if  you  randomly 
sample  four  departments  from  your  college  population,  the  four  departments  make  up  the  cluster  sample. 
For  example,  divide  your  college  faculty  by  department.  The  departments  are  the  clusters.  Number  each 
department  and  then  choose  four  different  numbers  using  simple  random  sampling.  AU  members  of  the 
four  departments  with  those  numbers  are  the  cluster  sample. 

To  choose  a  systematic  sample,  randomly  select  a  starting  point  and  take  every  nth  piece  of  data  from  a 
listing  of  the  population.  For  example,  suppose  you  have  to  do  a  phone  survey.  Your  phone  book  contains 
20,000  residence  listings.  You  must  choose  400  names  for  the  sample.  Number  the  population  1  -  20,000 
and  then  use  a  simple  random  sample  to  pick  a  number  that  represents  the  first  name  of  the  sample.  Then 
choose  every  50th  name  thereafter  until  you  have  a  total  of  400  names  (you  might  have  to  go  back  to  the  of 
your  phone  list).  Systematic  sampling  is  frequently  chosen  because  it  is  a  simple  method. 

A  type  of  sampling  that  is  nonrandom  is  convenience  sampling.  Convenience  sampling  involves  using 
resijlts  that  are  readily  available.  For  example,  a  computer  software  store  conducts  a  marketing  study  by 
interviewing  potential  customers  who  happen  to  be  in  the  store  browsing  through  the  available  software. 
The  resiilts  of  convenience  sampling  may  be  very  good  in  some  cases  and  highly  biased  (favors  certain 
outcomes)  in  others. 

Sampling  data  should  be  done  very  carefully.  Collecting  data  carelessly  can  have  devastating  resiilts.  Sur- 
veys mailed  to  households  and  then  returned  may  be  very  biased  (for  example,  they  may  favor  a  certain 
group).  It  is  better  for  the  person  conducting  the  survey  to  select  the  sample  respondents. 

True  random  sampling  is  done  with  replacement.  That  is,  once  a  member  is  picked  that  member  goes 
back  into  the  population  and  thus  may  be  chosen  more  than  once.  However  for  practical  reasons,  in  most 
populations,  simple  random  sampling  is  done  without  replacement.  Surveys  are  t5^ically  done  without 
replacement.  That  is,  a  member  of  the  population  may  be  chosen  only  once.  Most  samples  are  taken  from 
large  popiilations  and  the  sample  tends  to  be  small  in  comparison  to  the  population.  Since  this  is  the  case. 
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sampling  without  replacement  is  approximately  the  same  as  sampling  with  replacement  because  the  chance 
of  picking  the  same  individual  more  than  once  using  with  replacement  is  very  low. 

For  example,  in  a  college  population  of  10,000  people,  suppose  you  want  to  randomly  pick  a  sample  of  1000 
for  a  survey.  For  any  particular  sample  of  1000,  if  you  are  sampling  with  replacement, 

•  the  chance  of  picking  the  first  person  is  1000  out  of  10,000  (0.1000); 

•  the  chance  of  picking  a  different  second  person  for  this  sample  is  999  out  of  10,000  (0.0999); 

•  the  chance  of  picking  the  same  person  again  is  1  out  of  10,000  (very  low). 

If  you  are  sampling  without  replacement, 

•  the  chance  of  picking  the  first  person  for  any  particular  sample  is  1000  out  of  10,000  (0.1000); 

•  the  chance  of  picking  a  different  second  person  is  999  out  of  9,999  (0.0999); 

•  you  do  not  replace  the  first  person  before  picking  the  next  person. 

Compare  the  fractions  999/10,000  and  999/9,999.  For  accuracy,  carry  the  decimal  answers  to  4  place  deci- 
mals. To  4  decimal  places,  these  niraibers  are  equivalent  (0.0999). 

Sampling  without  replacement  instead  of  sampling  with  replacement  only  becomes  a  mathematics  issue 
when  the  population  is  small  which  is  not  that  common.  For  example,  if  the  population  is  25  people,  the 
sample  is  10  and  you  are  sampling  with  replacement  for  any  particular  sample, 

•  the  chance  of  picking  the  first  person  is  10  out  of  25  and  a  different  second  person  is  9  out  of  25  (you 
replace  the  first  person). 

If  you  sample  without  replacement, 

•  the  chance  of  picking  the  first  person  is  10  out  of  25  and  then  the  second  person  (which  is  different)  is 
9  out  of  24  (you  do  not  replace  the  first  person). 

Compare  the  fractions  9/25  and  9/24.  To  4  decimal  places,  9/25  =  0.3600  and  9/24  =  0.3750.  To  4  decimal 

places,  these  numbers  are  not  equivalent. 

When  you  analyze  data,  it  is  important  to  be  aware  of  sampling  errors  and  nonsampling  errors.  The  actual 
process  of  sampling  causes  sampling  errors.  For  example,  the  sample  may  not  be  large  enough.  Factors 
not  related  to  the  sampling  process  cause  nonsampling  errors.  A  defective  coimting  device  can  cause  a 
nonsampling  error. 

In  reality,  a  sample  will  never  be  exactly  representative  of  the  population  so  there  will  always  be 
some  sampling  error.  As  a  rule,  the  larger  the  sample,  the  smaller  the  sampling  error. 

In  statistics,  a  sampling  bias  is  created  when  a  sample  is  collected  from  a  population  and  some 
members  of  the  population  are  not  as  likely  to  be  chosen  as  others  (remember,  each  member  of  the 
population  shoiild  have  an  equally  likely  chance  of  being  chosen).  When  a  sampling  bias  happens,  there 
can  be  incorrect  conclusions  drawn  about  the  population  that  is  being  studied. 

Example  1.6 

Determine  the  type  of  sampling  used  (simple  random,  stratified,  systematic,  cluster,  or  conve- 
nience). 

1.  A  soccer  coach  selects  6  players  from  a  group  of  boys  aged  8  to  10,  7  players  from  a  group  of 
boys  aged  11  to  12,  and  3  players  from  a  group  of  boys  aged  13  to  14  to  form  a  recreational 
soccer  team. 

2.  A  pollster  interviews  all  human  resource  personnel  in  five  different  high  tech  companies. 
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3.  A  high  school  educational  researcher  interviews  50  high  school  female  teachers  and  50  high 
school  male  teachers. 

4.  A  medical  researcher  interviews  every  third  cancer  patient  from  a  list  of  cancer  patients  at  a 
local  hospital. 

5.  A  high  school  counselor  uses  a  computer  to  generate  50  random  numbers  and  then  picks 
students  whose  names  correspond  to  the  numbers. 

6.  A  student  interviews  classmates  in  his  algebra  class  to  determine  how  many  pairs  of  jeans  a 
student  owns,  on  the  average. 

Solution 

1.  stratified 

2.  cluster 

3.  stratified 

4.  systematic 

5.  simple  random 

6.  convenience 


If  we  were  to  examine  two  samples  representing  the  same  population,  even  if  we  used  random  sampling 
methods  for  the  samples,  they  would  not  be  exactly  the  same.  Just  as  there  is  variation  in  data,  there  is 
variation  in  samples.  As  you  become  accustomed  to  sampling,  the  variability  wiU  seem  natural. 

Example  1.7 

Suppose  ABC  College  has  10,000  part-time  students  (the  population).  We  are  interested  in  the 
average  amount  of  money  a  part-time  student  spends  on  books  in  the  fall  term.  Asking  all  10,000 
students  is  an  almost  impossible  task. 

Suppose  we  take  two  different  samples. 

First,  we  use  convenience  sampling  and  survey  10  students  from  a  first  term  organic  chemistry 
class.  Many  of  these  students  are  taking  first  term  calculus  in  addition  to  the  organic  chemistry 
class  .  The  amount  of  money  they  spend  is  as  follows: 

$128;  $87;  $173;  $116;  $130;  $204;  $147;  $189;  $93;  $153 

The  second  sample  is  taken  by  using  a  list  from  the  P.E.  department  of  senior  citizens  who  take 
RE.  classes  and  taking  every  5th  senior  citizen  on  the  list,  for  a  total  of  10  senior  citizens.  They 

spend: 

$50;  $40;  $36;  $15;  $50;  $100;  $40;  $53;  $22;  $22 
Problem  1 

Do  you  thinic  that  either  of  these  samples  is  representative  of  (or  is  characteristic  of)  the  entire 
10,000  part-time  student  popiilation? 

Solution 

No.  The  first  sample  probably  consists  of  science-oriented  students.  Besides  the  chemistry  course, 
some  of  them  are  taking  first-term  calculus.  Books  for  these  classes  tend  to  be  expensive.  Most 
of  these  students  are,  more  than  likely,  paying  more  than  the  average  part-time  student  for  their 
books.  The  second  sample  is  a  group  of  senior  citizens  who  are,  more  than  likely,  taking  courses 
for  health  and  interest.  The  amount  of  money  they  spend  on  books  is  probably  much  less  than  the 
average  part-time  student.  Both  samples  are  biased.  Also,  in  both  cases,  not  all  students  have  a 
chance  to  be  in  either  sample. 
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Problem  2 

Since  these  samples  are  not  representative  of  the  entire  population,  is  it  wise  to  use  the  resiilts  to 
describe  the  entire  population? 

Solution 

No.  For  these  samples,  each  member  of  the  population  did  not  have  an  equally  likely  chance  of 
being  chosen. 

Now,  suppose  we  take  a  third  sample.  We  choose  ten  different  part-time  students  from  the  dis- 
ciplines of  chemistry,  math,  English,  psychology,  sociology,  history,  nursing,  physical  education, 
art,  and  early  childhood  development.  (We  assume  that  these  are  the  only  disciplines  in  which 
part-time  students  at  ABC  College  are  enrolled  and  that  an  equal  number  of  part-time  students 
are  enrolled  in  each  of  the  disciplines.)  Each  student  is  chosen  using  simple  random  sampling. 
Using  a  calculator,  random  niraibers  are  generated  and  a  student  from  a  particiilar  discipline  is 
selected  if  he/she  has  a  corresponding  niraiber.  The  students  spend: 

$180;  $50;  $150;  $85;  $260;  $75;  $180;  $200;  $200;  $150 
Problem  3 

Is  the  sample  biased? 
Solution 

The  sample  is  unbiased,  but  a  larger  sample  would  be  recommended  to  increase  the  likelihood 
that  the  sample  will  be  close  to  representative  of  the  population.  However,  for  a  biased  sampling 
technique,  even  a  large  sample  runs  the  risk  of  not  being  representative  of  the  population. 

Students  often  ask  if  it  is  "good  enough"  to  take  a  sample,  instead  of  surveying  the  entire  popula- 
tion. If  the  survey  is  done  weU,  the  answer  is  yes. 


1.6.1  Optional  Collaborative  Classroom  Exercise 
Exercise  1.6.1 

As  a  class,  determine  whether  or  not  the  following  samples  are  representative.  If  they  are  not, 
discuss  the  reasons. 

1.  To  find  the  average  GPA  of  all  students  in  a  imiversity,  use  aU  honor  students  at  the  univer- 
sity as  the  sample. 

2.  To  find  out  the  most  popular  cereal  among  young  people  under  the  age  of  10,  stand  outside 
a  large  supermarket  for  three  hours  and  speak  to  every  20th  child  under  age  10  who  enters 
the  supermarket. 

3.  To  find  the  average  annual  income  of  all  adults  in  the  United  States,  sample  U.S.  congress- 
men. Create  a  cluster  sample  by  considering  each  state  as  a  stratum  (group).  By  using  simple 
random  sampling,  select  states  to  be  part  of  the  cluster.  Then  survey  every  U.S.  congressman 
in  the  cluster. 

4.  To  determine  the  proportion  of  people  taking  public  transportation  to  work,  survey  20  peo- 
ple in  New  York  City.  Conduct  the  survey  by  sitting  in  Central  Park  on  a  bench  and  inter- 
viewing every  person  who  sits  next  to  you. 

5.  To  determine  the  average  cost  of  a  two  day  stay  in  a  hospital  in  Massachusetts,  survey  100 
hospitals  across  the  state  using  simple  random  sampling. 
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1.7  Variation' 

1.7.1  Variation  in  Data 

Variation  is  present  in  any  set  of  data.  For  example,  16-ounce  cans  of  beverage  may  contain  more  or  less 
than  16  ounces  of  liquid.  In  one  study,  eight  16  ounce  cans  were  measured  and  produced  the  following 
amount  (in  ounces)  of  beverage: 

15.8;  16.1;  15.2;  14.8;  15.8;  15.9;  16.0;  15.5 

Measurements  of  the  amount  of  beverage  in  a  16-ounce  can  may  vary  because  different  people  make  the 
measurements  or  because  the  exact  amount,  16  ounces  of  liquid,  was  not  put  into  the  cans.  Manufacturers 
regularly  run  tests  to  determine  if  the  amoimt  of  beverage  in  a  16-oujn.ce  can  falls  within  the  desired  range. 

Be  aware  that  as  you  take  data,  your  data  may  vary  somewhat  from  the  data  someone  else  is  taking  for  the 
same  purpose.  This  is  completely  natural.  However,  if  two  or  more  of  you  are  taking  the  same  data  and 
get  very  different  results,  it  is  time  for  you  and  the  others  to  reevaluate  your  data-taking  methods  and  your 
accuracy. 

1.7.2  Variation  in  Samples 

It  was  mentioned  previously  that  two  or  more  samples  from  the  same  population,  taken  randomly,  and 
having  close  to  the  same  characteristics  of  the  population  are  different  from  each  other.  Suppose  Doreen  and 
Jung  both  decide  to  study  the  average  amount  of  time  students  at  their  college  sleep  each  night.  Doreen  and 
Jung  each  take  samples  of  500  students.  Doreen  uses  systematic  sampling  and  Jung  uses  cluster  sampling. 
Doreen's  sample  will  be  different  from  Jung's  sample.  Even  if  Doreen  and  Jung  used  the  same  sampling 
method,  in  all  likelihood  their  samples  woiild  be  different.  Neither  would  be  wrong,  however. 

Think  about  what  contributes  to  making  Doreen's  and  Jung's  samples  different. 

If  Doreen  and  Jung  took  larger  samples  (i.e.  the  number  of  data  values  is  increased),  their  sample  results 
(the  average  amount  of  time  a  student  sleeps)  might  be  closer  to  the  actual  population  average.  But  still, 
their  samples  would  be,  in  all  likelihood,  different  from  each  other.  This  variability  in  samples  cannot  be 
stressed  enough. 

1.7.2.1  Size  of  a  Sample 

The  size  of  a  sample  (often  called  the  number  of  observations)  is  important.  The  examples  you  have  seen 
in  this  book  so  far  have  been  small.  Samples  of  only  a  few  hundred  observations,  or  even  smaller,  are 
sufficient  for  many  purposes.  In  polling,  samples  that  are  from  1200  to  1500  observations  are  considered 
large  enough  and  good  enough  if  the  siirvey  is  random  and  is  well  done.  You  wiU  learn  why  when  you 
study  confidence  intervals. 

Be  aware  that  many  large  samples  are  biased.  For  example,  call-in  surveys  are  invariable  biased 
because  people  choose  to  respond  or  not. 

''This  content  is  available  online  at  <http://cnx.org/content/ml6021 /1. 15/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


32 


CHAPTER  1.  SAMPLING  AND  DATA 


1.7.2.2  Optional  Collaborative  Classroom  Exercise 
Exercise  1.7.1 

Divide  into  groups  of  two,  three,  or  four.  Your  instructor  will  give  each  group  one  6-sided  die. 
Try  this  experiment  twice.  Roll  one  fair  die  (6-sided)  20  times.  Record  the  number  of  ones,  twos, 
threes,  fours,  fives,  and  sixes  you  get  below  ("frequency"  is  the  number  of  times  a  particiilar  face 
of  the  die  occurs): 

First  Experiment  (20  rolls) 


Face  on  Die 

Frequency 

1 

2 

3 

4 

5 

6 

Table  1.13 
Second  Experiment  (20  rolls) 


Face  on  Die 

Frequency 

1 

2 

3 

4 

5 

6 

Table  1.14 


Did  the  two  experiments  have  the  same  results?  Probably  not.  If  you  did  the  experiment  a  third 
time,  do  you  expect  the  resiilts  to  be  identical  to  the  first  or  second  experiment?  (Answer  yes  or 
no.)  Why  or  why  not? 

Which  experiment  had  the  correct  results?  They  both  did.  The  job  of  the  statistician  is  to  see 
through  the  variability  and  draw  appropriate  conclusions. 


1.7.3  Critical  Evaluation 

We  need  to  critically  evaluate  the  statistical  studies  we  read  about  and  analyze  before  accepting  the  resiilts 
of  the  study.  Common  problems  to  be  aware  of  include 

•  Problems  with  Samples:  A  sample  should  be  representative  of  the  population.  A  sample  that  is  not 
representative  of  the  population  is  biased.  Biased  samples  that  are  not  representative  of  the  popula- 
tion give  results  that  are  inaccurate  and  not  valid. 
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•  Self -Selected  Samples:  Responses  only  by  people  who  choose  to  respond,  such  as  call-in  surveys  are 
often  unreliable. 

•  Sample  Size  Issues:  Samples  that  are  too  small  may  be  unreliable.  Larger  samples  are  better  if  possible. 
In  some  situations,  small  samples  are  unavoidable  and  can  still  be  used  to  draw  conclusions,  even 
though  larger  samples  are  better.  Examples:  Crash  testing  cars,  medical  testing  for  rare  conditions. 

•  Undue  influence:  Collecting  data  or  asking  questions  in  a  way  that  influences  the  response. 

•  Non-response  or  refusal  of  subject  to  participate:  The  collected  responses  may  no  longer  be  represen- 
tative of  the  population.  Often,  people  with  strong  positive  or  negative  opinions  may  answer  surveys, 
which  can  affect  the  results. 

•  Causality:  A  relationship  between  two  variables  does  not  mean  that  one  causes  the  other  to  occur. 
They  may  both  be  related  (correlated)  because  of  their  relationship  through  a  different  variable. 

•  Self-Funded  or  Self-interest  Studies:  A  study  performed  by  a  person  or  organization  in  order  to  sup- 
port their  claim.  Is  the  study  impartial?  Read  the  study  carefully  to  evaluate  the  work.  Do  not 
automatically  assume  that  the  study  is  good  but  do  not  automatically  assume  the  study  is  bad  either. 
Evaluate  it  on  its  merits  and  the  work  done. 

•  Misleading  Use  of  Data:  Improperly  displayed  graphs,  incomplete  data,  lack  of  context. 

•  Confounding:  When  the  effects  of  multiple  factors  on  a  response  cannot  be  separated.  Confounding 
makes  it  difficult  or  impossible  to  draw  valid  conclusions  about  the  effect  of  each  factor. 


1.8  Answers  and  Rounding  Off 

A  simple  way  to  round  off  answers  is  to  carry  your  final  answer  one  more  decimal  place  than  was  present 
in  the  original  data.  Round  only  the  final  answer.  Do  not  round  any  intermediate  results,  if  possible.  If  it 
becomes  necessary  to  round  intermediate  results,  carry  them  to  at  least  twice  as  many  decimal  places  as  the 
final  answer.  For  example,  the  average  of  the  three  quiz  scores  4,  6,  9  is  6.3,  rounded  to  the  nearest  tenth, 
because  the  data  are  whole  numbers.  Most  answers  will  be  rounded  in  this  manner. 

It  is  not  necessary  to  reduce  most  fractions  in  this  course.  Especially  in  Probability  Topics  (Section  3.1),  the 
chapter  on  probability,  it  is  more  helpful  to  leave  an  answer  as  an  unreduced  fraction. 

1.9  Frequency^ 

Twenty  students  were  asked  how  many  hours  they  worked  per  day.  Their  responses,  in  hours,  are  listed 
below: 

5;  6;  3;  3;  2;  4;  7;  5;  2;  3;  5;  6;  5;  4;  4;  3;  5;  2;  5;  3 

Below  is  a  frequency  table  listing  the  different  data  values  in  ascending  order  and  their  frequencies. 

**This  content  is  available  online  at  <http://cnx.0rg/content/ml6OO6/l.8/>. 
'This  content  is  available  onUne  at  <http: / /cnx.org/content/ml6012/1.20/>. 
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Frequency  Table  of  Student  Work  Hours 


DATA  VALUE 

FREQUENCY 

2 

3 

3 

5 

4 

3 

5 

6 

6 

2 

7 

1 

Table  1.15 


A  frequency  is  the  number  of  times  a  given  datum  occurs  in  a  data  set.  According  to  the  table  above, 
there  are  three  students  who  work  2  hours,  five  students  who  work  3  hours,  etc.  The  total  of  the  frequency 
column,  20,  represents  the  total  number  of  students  included  in  the  sample. 

A  relative  frequency  is  the  fraction  or  proportion  of  times  an  answer  occurs.  To  find  the  relative  fre- 
quencies, divide  each  frequency  by  the  total  number  of  students  in  the  sample  -  in  this  case,  20.  Relative 
frequencies  can  be  written  as  fractions,  percents,  or  decimals. 

Frequency  Table  of  Student  Work  Hours  w/  Relative  Frequency 


DATA  VALUE 

FREQUENCY 

RELATIVE  FREQUENCY 

2 

3 

^  or  0.15 

3 

5 

^  or  0.25 

4 

3 

^  or  0.15 

5 

6 

I5  or  0.30 

6 

2 

^  or  0.10 

7 

1 

^  or  0.05 

Table  1.16 

The  sum  of  the  relative  frequency  column  is      or  1. 

Cumulative  relative  frequency  is  the  accumulation  of  the  previous  relative  frequencies.  To  find  the  ciraiu- 
lative  relative  frequencies,  add  aU  the  previous  relative  frequencies  to  the  relative  frequency  for  the  current 
row. 


Frequency  Table  of  Student  Work  Hours  w/  Relative  and  Cumulative  Relative  Frequency 


DATA  VALUE 

FREQUENCY 

RELATIVE 
FREQUENCY 

CUMULATIVE  RELA- 
TIVE 

FREQUENCY 

continued  on  next  page 
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2 

3 

^  or  0.15 

0.15 

3 

5 

^  or  0.25 

0.15  +  0.25  =  0.40 

4 

3 

^  or  0.15 

0.40  +  0.15  =  0.55 

5 

6 

^  or  0.30 

0.55  +  0.30  =  0.85 

6 

2 

^  or  0.10 

0.85  +  0.10  =  0.95 

7 

1 

2^  or  0.05 

0.95  +  0.05  =  1.00 

Table  1.17 


The  last  entry  of  the  cumulative  relative  frequency  column  is  one,  indicating  that  one  himdred  percent  of 
the  data  has  been  accumulated. 

NOTE:  Because  of  rounding,  the  relative  frequency  column  may  not  always  sum  to  one  and  the  last 
entry  in  the  cumulative  relative  frequency  column  may  not  be  one.  However,  they  each  should  be 
close  to  one. 

The  following  table  represents  the  heights,  in  inches,  of  a  sample  of  100  male  semiprofessional  soccer  play- 
ers. 


Frequency  Table  of  Soccer  Player  Height 


HEIGHTS 
(INCHES) 

FREQUENCY 

RELATIVE 
FREQUENCY 

CUMULATIVE 

RELATIVE 

FREQUENCY 

59.95-61.95 

5 

4  =  0.05 

0.05 

61.95-63.95 

3 

4  =  0.03 

0.05  +  0.03  =  0.08 

63.95  -  65.95 

15 

100  ~  ^•^'^ 

0.08  +  0.15  =  0.23 

65.95  -  67.95 

40 

^  -  040 
100  ~  ^-^^ 

0.23  +  0.40  =  0.63 

67.95  -  69.95 

17 

-  0  17 

100 

0.63  +  0.17  =  0.80 

69.95  -  71.95 

12 

TO  =0-12 

0.80  +  0.12  =  0.92 

71.95  -  73.95 

7 

4=0.07 

0.92  +  0.07  =  0.99 

73.95  -  75.95 

1 

4=0.01 

0.99  +  0.01  =  1.00 

Total  =  100 

Total  =  1.00 

Table  1.18 


The  data  in  this  table  has  been  grouped  into  the  following  intervals: 

•  59.95  -  61.95  inches 

•  61.95  -  63.95  inches 

•  63.95  -  65.95  inches 

•  65.95  -  67.95  inches 

•  67.95  -  69.95  inches 

•  69.95 -71.95  inches 

•  71.95  -  73.95  inches 

•  73.95  -  75.95  inches 
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NOTE:  This  example  is  used  again  in  the  Descriptive  Statistics  (Section  2.1)  chapter,  where  the 
method  used  to  compute  the  intervals  will  be  explained. 

In  this  sample,  there  are  5  players  whose  heights  are  between  59.95  -  61.95  inches,  3  players  whose  heights 
fall  within  the  interval  61.95  -  63.95  inches,  15  players  whose  heights  fall  within  the  interval  63.95  -  65.95 
inches,  40  players  whose  heights  fall  within  the  interval  65.95  -  67.95  inches,  17  players  whose  heights 
fall  within  the  interval  67.95  -  69.95  inches,  12  players  whose  heights  fall  within  the  interval  69.95  -  71.95, 
7  players  whose  height  falls  within  the  interval  71.95  -  73.95,  and  1  player  whose  height  falls  within  the 
interval  73.95  -  75.95.  AU  heights  fall  between  the  endpoints  of  an  interval  and  not  at  the  endpoints. 

Example  1.8 

From  the  table,  find  the  percentage  of  heights  that  are  less  than  65.95  inches. 
Solution 

If  you  look  at  the  first,  second,  and  third  rows,  the  heights  are  all  less  than  65.95  inches.  There  are 
5  +  3  +  15  =  23  males  whose  heights  are  less  than  65.95  inches.  The  percentage  of  heights  less  than 
65.95  inches  is  then  ^  or  23%.  This  percentage  is  the  cumulative  relative  frequency  entry  in  the 
third  row. 


Example  1.9 

From  the  table,  find  the  percentage  of  heights  that  fall  between  61.95  and  65.95  inches. 
Solution 

Add  the  relative  frequencies  in  the  second  and  third  rows:  0.03  +  0.15  =  0.18  or  18%. 


Example  1.10 

Use  the  table  of  heights  of  the  100  male  semiprofessional  soccer  players.  FiU  in  the  blanks  and 
check  your  answers. 

1.  The  percentage  of  heights  that  are  from  67.95  to  71.95  inches  is: 

2.  The  percentage  of  heights  that  are  from  67.95  to  73.95  inches  is: 

3.  The  percentage  of  heights  that  are  more  than  65.95  inches  is: 

4.  The  number  of  players  in  the  sample  who  are  between  61.95  and  71.95  inches  tall  is: 

5.  What  kind  of  data  are  the  heights? 

6.  Describe  how  you  could  gather  this  data  (the  heights)  so  that  the  data  are  characteristic  of  all 
male  semiprofessional  soccer  players. 

Remember,  you  count  frequencies.  To  find  the  relative  frequency,  divide  the  frequency  by  the 
total  number  of  data  values.  To  find  the  cumulative  relative  frequency,  add  all  of  the  previous 
relative  frequencies  to  the  relative  frequency  for  the  current  row. 


1.9.1  Optional  Collaborative  Classroom  Exercise 

Exercise  1.9.1 

In  your  class,  have  someone  conduct  a  survey  of  the  number  of  siblings  (brothers  and  sisters)  each 
student  has.  Create  a  frequency  table.  Add  to  it  a  relative  frequency  coliram  and  a  cumulative 
relative  frequency  column.  Answer  the  following  questions: 
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1.  What  percentage  of  the  students  in  your  class  has  0  siblings? 

2.  What  percentage  of  the  students  has  from  1  to  3  siblings? 

3.  What  percentage  of  the  students  has  fewer  than  3  siblings? 

Example  1.11 

Nineteen  people  were  asked  how  many  miles,  to  the  nearest  mile  they  commute  to  work  each 
day.  The  data  are  as  follows: 

2;  5;  7;  3;  2;  10;  18;  15;  20;  7;  10;  18;  5;  12;  13;  12;  4;  5;  10 
The  following  table  was  produced: 

Frequency  of  Commuting  Distances 


DATA 

FREQUENCY 

RELATIVEFREQUENCY 

CUMULATIVERELATIVEFREQUENCY 

3 

3 

3 
19 

0.1579 

4 

1 

1 

19 

0.2105 

5 

3 

3 
19 

0.1579 

7 

2 

2 
19 

0.2632 

10 

3 

4 
19 

0.4737 

12 

2 

2 
19 

0.7895 

13 

1 

1 

19 

0.8421 

15 

1 

1 

19 

0.8948 

18 

1 

1 

19 

0.9474 

20 

1 

1 

19 

1.0000 

Table  1.19 


Problem  (Solution  on  p.  55.) 

1.  Is  the  table  correct?  If  it  is  not  correct,  what  is  wrong? 

2.  True  or  False:  Three  percent  of  the  people  surveyed  commute  3  miles.  If  the  statement  is  not 
correct,  what  should  it  be?  If  the  table  is  incorrect,  make  the  corrections. 

3.  What  fraction  of  the  people  surveyed  commute  5  or  7  miles? 

4.  What  fraction  of  the  people  surveyed  commute  12  miles  or  more?  Less  than  12  miles?  Be- 
tween 5  and  13  miles  (does  not  include  5  and  13  miles)? 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


38 


CHAPTER  1.  SAMPLING  AND  DATA 


1.10  Summary^" 

statistics 

•  Deals  with  the  collection,  analysis,  interpretation,  and  presentation  of  data 
Probability 

•  Mathematical  tool  used  to  study  randomness 

Key  Terms 

•  Population 

•  Parameter 

•  Sample 

•  Statistic 

•  Variable 

•  Data 

Types  of  Data 

•  Quantitative  Data  (a  number) 

•  Discrete  (You  count  it.) 

•  Continuous  (You  measure  it.) 

•  Qualitative  Data  (a  category,  words) 
Sampling 

•  With  Replacement:  A  member  of  the  population  may  be  chosen  more  than  once 

•  Without  Replacement:  A  member  of  the  population  may  be  chosen  only  once 

Random  Sampling 

•  Each  member  of  the  popiilation  has  an  equal  chance  of  being  selected 

Sampling  Methods 

•  Random 

•  Simple  random  sample 

•  Stratified  sample 

•  Cluster  sample 

•  Systematic  sample 

•  Not  Random 

•  Convenience  sample 

Frequency  (freq.  or  f) 

•  The  number  of  times  an  answer  occurs 

Relative  Frequency  (rel.  freq.  or  RF) 

•  The  proportion  of  times  an  answer  occurs 

•  Can  be  interpreted  as  a  fraction,  decimal,  or  percent 

Cumulative  Relative  Frequencies  (cum.  rel.  freq.  or  cum  RF) 

•  An  accumulation  of  the  previous  relative  frequencies 

^"This  content  is  available  online  at  <http://cnx.org/content/ml6023/1.10/>. 
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1.11  Practice:  Sampling  and  Data" 

1.11.1  Student  Learning  Outcomes 

•  The  student  will  construct  frequency  tables. 

•  The  student  will  differentiate  between  key  terms. 

•  The  student  will  compare  sampling  techniques. 


1.11.2  Given 

Studies  are  often  done  by  pharmaceutical  companies  to  determine  the  effectiveness  of  a  treatment  program. 
Suppose  that  a  new  AIDS  antibody  drug  is  currently  under  study.  It  is  given  to  patients  once  the  AIDS 
symptoms  have  revealed  themselves.  Of  interest  is  the  average(mean)  length  of  time  in  months  patients 
live  once  starting  the  treatment.  Two  researchers  each  follow  a  different  set  of  40  AIDS  patients  from  the 
start  of  treatment  until  their  deaths.  The  following  data  (in  months)  are  collected. 

Researcher  A  3;  4;  11;  15;  16;  17;  22;  44;  37;  16;  14;  24;  25;  15;  26;  27;  33;  29;  35;  44;  13;  21;  22;  10;  12;  8;  40;  32; 
26;  27;  31;  34;  29;  17;  8;  24;  18;  47;  33;  34 

Researcher  B  3;  14;  11;  5;  16;  17;  28;  41;  31;  18;  14;  14;  26;  25;  21;  22;  31;  2;  35;  44;  23;  21;  21;  16;  12;  18;  41;  22; 
16;  25;  33;  34;  29;  13;  18;  24;  23;  42;  33;  29 

1.11.3  Organize  the  Data 

Complete  the  tables  below  using  the  data  provided. 


Researcher  A 


Survival 

Length 

(in 

months) 

Frequency 

Relative  Frequency 

Cumulative   Relative  Fre- 
quency 

0.5-6.5 

6.5  - 
12.5 

12.5  - 
18.5 

18.5  - 
24.5 

24.5  - 
30.5 

continued  on  next  page 
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30.5  - 
36.5 

36.5  - 
42.5 

42.5  - 
48.5 

Table  1.20 


Researcher  B 


Survival      Length  (in 
months) 

Frequency 

Relative  Frequency 

Cumulative  Relative  Fre- 
quency 

0.5  -  6.5 

6.5  - 12.5 

12.5  - 18.5 

18.5  -  24.5 

24.5  -  30.5 

30.5  -  36.5 

36.5  -  42.5 

42.5  -  48.5 

Table  1.21 


1.11.4  Key  Terms 

Define  the  key  terms  based  upon  the  above  example  for  Researcher  A. 

Exercise  1.11.1 
Population 

Exercise  1.11.2 

Sample 

Exercise  1.11.3 
Parameter 

Exercise  1.11.4 

Statistic 

Exercise  1.11.5 
Variable 

Exercise  1.11.6 

Data 
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1.11.5  Discussion  Questions 

Discuss  the  following  questions  and  then  answer  in  complete  sentences. 
Exercise  1.11.7 

List  two  reasons  why  the  data  may  differ. 
Exercise  1.11.8 

Can  you  tell  if  one  researcher  is  correct  and  the  other  one  is  incorrect?  Why? 
Exercise  1.11.9 

Would  you  expect  the  data  to  be  identical?  Why  or  why  not? 
Exercise  1.11.10 

How  could  the  researchers  gather  random  data? 
Exercise  1.11.11 

Suppose  that  the  first  researcher  conducted  his  survey  by  randomly  choosing  one  state  in  the 
nation  and  then  randomly  picking  40  patients  from  that  state.  What  sampling  method  would  that 

researcher  have  used? 

Exercise  1.11.12 

Suppose  that  the  second  researcher  conducted  his  survey  by  choosing  40  patients  he  knew.  What 
sampling  method  would  that  researcher  have  used?  What  concerns  would  you  have  about  this 
data  set,  based  upon  the  data  collection  method? 
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1.12  Homework  ' 

Exercise  1.12.1  (Solution  on  p.  55.) 

For  each  item  below: 

i.  Identify  the  type  of  data  (quantitative  -  discrete,  quantitative  -  continuous,  or  qualitative)  that 

would  be  used  to  describe  a  response. 

ii.  Give  an  example  of  the  data. 

a.  Number  of  tickets  sold  to  a  concert 

b.  Amount  of  body  fat 

c.  Favorite  baseball  team 

d.  Time  in  line  to  buy  groceries 

e.  Number  of  students  enrolled  at  Evergreen  Valley  College 

f .  Most-watched  television  show 

g.  Brand  of  toothpaste 

h.  Distance  to  the  closest  movie  theatre 

i.  Age  of  executives  in  Fortune  500  companies 

j.  Number  of  competing  computer  spreadsheet  software  packages 

Exercise  1.12.2 

Fifty  part-time  students  were  asked  how  many  courses  they  were  taking  this  term.  The  (incom- 
plete) results  are  shown  below: 


Part-time  Student  Course  Loads 


#  of  Courses 

Frequency 

Relative  Frequency 

Cumulative  Relative 
Frequency 

1 

30 

0.6 

2 

15 

3 

Table  1.22 


a.  Fill  in  the  blanks  in  the  table  above. 

b.  What  percent  of  students  take  exactly  two  courses? 

c.  What  percent  of  students  take  one  or  two  courses? 

Exercise  1.12.3  (Solution  on  p.  55.) 

Sixty  adults  with  gum  disease  were  asked  the  number  of  times  per  week  they  used  to  floss  before 
their  diagnoses.  The  (incomplete)  resiilts  are  shown  below: 

Flossing  Frequency  for  Adults  with  Gum  Disease 


#  Flossing  per  Week 

Frequency 

Relative  Frequency 

Cumulative  Relative  Freq. 

0 

27 

0.4500 

1 

18 

3 

0.9333 

6 

3 

0.0500 

7 

1 

0.0167 

"This  content  is  available  online  at  <http: / /cnx.org/content/ml6010/1.19/>. 
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Table  1.23 

a.  Fill  in  the  blanks  in  the  table  above. 

b.  What  percent  of  adults  flossed  six  times  per  week? 

c.  What  percent  flossed  at  most  three  times  per  week? 

Exercise  1.12.4 

A  fltness  center  is  interested  in  the  mean  amount  of  time  a  client  exercises  in  the  center  each  week. 
Define  the  following  in  terms  of  the  study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.5  (Solution  on  p.  55.) 

Ski  resorts  are  interested  in  the  mean  age  that  children  take  their  first  ski  and  snowboard  lessons. 
They  need  this  information  to  optimally  plan  their  ski  classes.  Define  the  following  in  terms  of  the 
study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.6 

A  cardiologist  is  interested  in  the  mean  recovery  period  for  her  patients  who  have  had  heart 
attacks.  Define  the  following  in  terms  of  the  study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.7  (Solution  on  p.  56.) 

Insurance  companies  are  interested  in  the  mean  health  costs  each  year  for  their  clients,  so  that 
they  can  determine  the  costs  of  health  insiurance.  Define  the  following  in  terms  of  the  study.  Give 
examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 
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Exercise  1.12.8 

A  politician  is  interested  in  the  proportion  of  voters  in  his  district  that  think  he  is  doing  a  good 
job.  Define  the  following  in  terms  of  the  study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.9  (Solution  on  p.  56.) 

A  marriage  counselor  is  interested  in  the  proportion  the  clients  she  counsels  that  stay  married. 
Define  the  following  in  terms  of  the  study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.10 

Political  pollsters  may  be  interested  in  the  proportion  of  people  that  will  vote  for  a  particular 
cause.  Define  the  following  in  terms  of  the  study.  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.11  (Solution  on  p.  56.) 

A  marketing  company  is  interested  in  the  proportion  of  people  that  will  buy  a  particular  product. 
Define  the  following  in  terms  of  the  study  Give  examples  where  appropriate. 

a.  Population 

b.  Sample 

c.  Parameter 

d.  Statistic 

e.  Variable 

f.  Data 

Exercise  1.12.12 

Airline  companies  are  interested  in  the  consistency  of  the  number  of  babies  on  each  flight,  so  that 
they  have  adequate  safety  equipment.  Suppose  an  airline  conducts  a  survey  Over  Thanksgiving 
weekend,  it  surveys  6  flights  from  Boston  to  Salt  Lake  City  to  determine  the  number  of  babies  on 
the  flights.  It  determines  the  amoimt  of  safety  eqmpment  needed  by  the  result  of  that  study. 

a.  Using  complete  sentences,  list  three  things  wrong  with  the  way  the  survey  was  conducted. 

b.  Using  complete  sentences,  list  three  ways  that  you  would  improve  the  survey  if  it  were  to  be 

repeated. 
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Exercise  1.12.13 

Suppose  you  want  to  determine  the  mean  number  of  students  per  statistics  class  in  your  state. 
Describe  a  possible  sampling  method  in  3  -  5  complete  sentences.  Make  the  description  detailed. 

Exercise  1.12.14 

Suppose  you  want  to  determine  the  mean  number  of  cans  of  soda  drunk  each  month  by  persons 
in  their  twenties.  Describe  a  possible  sampling  method  in  3  -  5  complete  sentences.  Make  the 
description  detailed. 

Exercise  1.12.15  (Solution  on  p.  56.) 

771  distance  learning  students  at  Long  Beach  City  College  responded  to  surveys  in  the  2010- 
11  academic  year  Highlights  of  the  summary  report  are  listed  in  the  table  below.  (Soiurce: 
http://de.lbcc.edU/reports/2010-ll/future/highlights.html#focus). 

LBCC  Distance  Learning  Survey  Results 


Have  computer  at  home 

96% 

Unable  to  come  to  campus  for  classes 

65% 

Age  41  or  over 

24% 

Would  like  LBCC  to  offer  more  DL  courses 

95% 

Took  DL  classes  due  to  a  disability 

17% 

Live  at  least  16  miles  from  campus 

13% 

Took  DL  coiirses  to  fulfill  transfer  reqiiirements 

71% 

Table  1.24 


a.  What  percent  of  the  students  surveyed  do  not  have  a  computer  at  home? 

b.  About  how  many  students  in  the  survey  live  at  least  16  miles  from  campus? 

c.  If  the  same  survey  was  done  at  Great  Basin  College  in  Elko,  Nevada,  do  you  think  the  percent- 

ages woiild  be  the  same?  Why? 

Exercise  1.12.16 

Nineteen  immigrants  to  the  U.S  were  asked  how  many  years,  to  the  nearest  year,  they  have  lived 
in  the  U.S.  The  data  are  as  follows: 

2;  5;  7;  2;  2;  10;  20;  15;  0;  7;  0;  20;  5;  12;  15;  12;  4;  5;  10 

The  following  table  was  produced: 
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Frequency  of  Immigrant  Survey  Responses 


Data 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

0 

2 

2 
19 

0.1053 

2 

3 

3 
19 

0.2632 

4 

1 

1 

19 

0.3158 

5 

3 

3 
19 

0.1579 

7 

2 

2 
19 

0.5789 

10 

2 

2 
19 

0.6842 

12 

2 

2 
19 

0.7895 

15 

1 

1 

19 

0.8421 

20 

1 

1 

19 

1.0000 

Table  1.25 


a.  Fix  the  errors  on  the  table.  Also,  explain  how  someone  might  have  arrived  at  the  incorrect 

number(s). 

b.  Explain  what  is  wrong  with  this  statement:  "47  percent  of  the  people  surveyed  have  lived  in 

the  U.S.  for  5  years." 

c.  Fix  the  statement  above  to  make  it  correct. 

d.  What  fraction  of  the  people  surveyed  have  lived  in  the  U.S.  5  or  7  years? 

e.  What  fraction  of  the  people  surveyed  have  lived  in  the  U.S.  at  most  12  years? 

f .  What  fraction  of  the  people  surveyed  have  lived  in  the  U.S.  fewer  than  12  years? 

g.  What  fraction  of  the  people  surveyed  have  lived  in  the  U.S.  from  5  to  20  years,  inclusive? 

Exercise  1.12.17 

A  "random  survey"  was  conducted  of  3274  people  of  the  "microprocessor  generation"  (people 
born  since  1971,  the  year  the  microprocessor  was  invented).  It  was  reported  that  48%  of  those 
individuals  surveyed  stated  that  if  they  had  $2000  to  spend,  they  would  use  it  for  computer 
equipment.  Also,  66%  of  those  surveyed  considered  themselves  relatively  savvy  computer  users. 
{Source:  San  Jose  Mercury  News) 

a.  Do  you  consider  the  sample  size  large  enough  for  a  study  of  this  type?  Why  or  why  not? 

b.  Based  on  your  "gut  feeling,"  do  you  believe  the  percents  accurately  reflect  the  U.S.  population 

for  those  individuals  born  since  1971?  If  not,  do  you  think  the  percents  of  the  population  are 
actually  higher  or  lower  than  the  sample  statistics?  Why? 

Additional  information:  The  survey  was  reported  by  Intel  Corporation  of  individuals  who  visited 
the  Los  Angeles  Convention  Center  to  see  the  Smithsonian  Institure's  road  show  called  "America's 
Smithsonian." 

c.  With  this  additional  information,  do  you  feel  that  all  demographic  and  ethnic  groups  were 

equally  represented  at  the  event?  Why  or  why  not? 

d.  With  the  additional  information,  comment  on  how  accurately  you  think  the  sample  statistics 

reflect  the  population  parameters. 

Exercise  1.12.18 
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a.  List  some  practical  difficulties  involved  in  getting  accurate  results  from  a  telephone  survey. 

b.  List  some  practical  difficulties  involved  in  getting  acciirate  results  from  a  mailed  survey. 

c.  With  your  classmates,  brainstorm  some  ways  to  overcome  these  problems  if  you  needed  to 

conduct  a  phone  or  mail  survey. 


1.12.1  Try  these  multiple  choice  questions 

The  next  four  questions  refer  to  the  following:  A  Lake  Tahoe  Community  College  instructor  is  interested 
in  the  mean  number  of  days  Lake  Tahoe  Community  College  math  students  are  absent  from  class  during  a 
quarter. 

Exercise  1.12.19  (Solution  on  p.  56.) 

What  is  the  population  she  is  interested  in? 

A.  All  Lake  Tahoe  Community  College  students 

B.  All  Lake  Tahoe  Community  College  English  students 

C.  All  Lake  Tahoe  Community  College  students  in  her  classes 

D.  All  Lake  Tahoe  Commimity  College  math  students 

Exercise  1.12.20  (Solution  on  p.  56.) 

Consider  the  following: 

X  =  number  of  days  a  Lake  Tahoe  Community  College  math  student  is  absent 

In  this  case,  X  is  an  example  of  a: 

A.  Variable 

B.  Population 

C.  Statistic 

D.  Data 


Exercise  1.12.21  (Solution  on  p.  56.) 

The  instructor  takes  her  sample  by  gathering  data  on  5  randomly  selected  students  from  each 
Lake  Tahoe  Commujnity  College  math  class.  The  type  of  sampling  she  used  is 

A.  Cluster  sampling 

B.  Stratified  sampling 

C.  Simple  random  sampling 

D.  Convenience  sampling 

Exercise  1.12.22  (Solution  on  p.  56.) 

The  instructor's  sample  produces  an  mean  number  of  days  absent  of  3.5  days.  This  value  is  an 
example  of  a 

A.  Parameter 

B.  Data 

C.  Statistic 

D.  Variable 


The  next  two  questions  refer  to  the  following  relative  frequency  table  on  hurricanes  that  have  made  direct 
hits  on  the  U.S  between  1851  and  2004.  Hurricanes  are  given  a  strength  category  rating  based  on  the 
minimum  wind  speed  generated  by  the  storm,  {http://www.nhc.noaa.gov/gifs/ tabled. gif^^) 

^^http:/ / www.nhc.noaa.gov/gifs/table5.gif 
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Frequency  of  Hurricane  Direct  Hits 


Category 

Number  of  Direct  Hits 

Relative  Frequency 

Cumulative  Frequency 

1 

109 

0.3993 

0.3993 

2 

72 

0.2637 

0.6630 

3 

71 

0.2601 

4 

18 

0.9890 

5 

3 

0.0110 

1.0000 

Total  =  273 

Table  1.26 

Exercise  1.12.23 

What  is  the  relative  frequency  of  direct  hits  that  were  category  4  hurricanes? 


(Solution  on  p.  56.) 


A.  0.0768 

B.  0.0659 

C.  0.2601 

D.  Not  enough  information  to  calculate 

Exercise  1.12.24  (Solution  on  p.  56.) 

What  is  the  relative  frequency  of  direct  hits  that  were  AT  MOST  a  category  3  storm? 

A.  0.3480 

B.  0.9231 

C.  0.2601 

D.  0.3370 

The  next  three  questions  refer  to  the  following:  A  study  was  done  to  determine  the  age,  number  of  times 
per  week  and  the  duration  (amount  of  time)  of  resident  use  of  a  local  park  in  San  Jose.  The  first  house  in 
the  neighborhood  around  the  park  was  selected  randomly  and  then  every  8th  house  in  the  neighborhood 
around  the  park  was  interviewed. 

Exercise  1.12.25  (Solution  on  p.  56.) 

'"Number  of  times  per  week'"  is  what  type  of  data? 

A.  qualitative 

B.  quantitative  -  discrete 

C.  quantitative  -  continuous 

Exercise  1.12.26 

The  sampling  method  was: 

A.  simple  random 

B.  systematic 

C.  stratified 

D.  cluster 


(Solution  on  p.  56.) 


Exercise  1.12.27 

'"Duration  (amount  of  time)'"  is  what  tj^e  of  data? 


(Solution  on  p.  56.) 
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A.  qualitative 

B.  quantitative  -  discrete 

C.  quantitative  -  continuous 

Exercises  28  and  29  are  not  multiple  choice  exercises. 

Exercise  1.12.28  (Solution  on  p.  56.) 

Name  the  sampling  method  used  in  each  of  the  following  situations: 

A.  A  woman  in  the  airport  is  handing  out  questionnaires  to  travelers  asking  them  to  evaluate  the 

airport's  service.  She  does  not  ask  travelers  who  are  hurrying  through  the  airport  with  their 
hands  full  of  luggage,  but  instead  asks  all  travelers  sitting  near  gates  and  who  are  not  taking 
naps  while  they  wait. 

B.  A  teacher  wants  to  know  if  her  students  are  doing  homework  so  she  randomly  selects  rows  2 

and  5,  and  then  calls  on  all  students  in  row  2  and  all  students  in  row  5  to  present  the  solution 
to  homework  problems  to  the  class. 

C.  The  marketing  manager  for  an  electronics  chain  store  wants  information  about  the  ages  of  its 

customers.  Over  the  next  two  weeks,  at  each  store  location,  100  randomly  selected  customers 
are  given  questionnaires  to  fill  out  which  asks  for  information  about  age,  as  well  as  about 
other  variables  of  interest. 

D.  The  librarian  at  a  public  library  wants  to  determine  what  proportion  of  the  library  users  are 

children.  The  librarian  has  a  tally  sheet  on  which  she  marks  whether  the  books  are  checked 
out  by  an  adult  or  a  child.  She  records  this  data  for  every  4th  patron  who  checks  out  books. 

E.  A  political  party  wants  to  know  the  reaction  of  voters  to  a  debate  between  the  candidates.  The 

day  after  the  debate,  the  party's  polling  staff  calls  1200  randomly  selected  phone  numbers. 
If  a  registered  voter  answers  the  phone  or  is  available  to  come  to  the  phone,  that  registered 
voter  is  asked  who  he/she  intends  to  vote  for  and  whether  the  debate  changed  his/her 
opinion  of  the  candidates. 

**  Contributed  by  Roberta  Bloom 

Exercise  1.12.29  (Solution  on  p.  57.) 

Several  online  textbook  retailers  advertise  that  they  have  lower  prices  than  on-campus  book- 
stores. However,  an  important  factor  is  whether  the  internet  retailers  actually  have  the  textbooks 
that  students  need  in  stock.  Students  need  to  be  able  to  get  textbooks  promptly  at  the  beginning  of 
the  college  term.  If  the  book  is  not  available,  then  a  student  woiild  not  be  able  to  get  the  textbook 
at  aU,  or  might  get  a  delayed  delivery  if  the  book  is  back  ordered. 

A  college  newspaper  reporter  is  investigating  textbook  availability  at  online  retailers.  He 
decides  to  investigate  one  textbook  for  each  of  the  following  7  subjects:  calculus,  biology, 
chemistry,  physics,  statistics,  geology,  and  general  engineering.  He  consults  textbook  industry 
sales  data  and  selects  the  most  popular  nationally  used  textbook  in  each  of  these  subjects.  He 
visits  websites  for  a  random  sample  of  major  online  textbook  sellers  and  looks  up  each  of  these  7 
textbooks  to  see  if  they  are  available  in  stock  for  quick  delivery  through  these  retailers.  Based  on 
his  investigation,  he  writes  an  article  in  which  he  draws  conclusions  about  the  overall  availability 
of  all  college  textbooks  through  online  textbook  retailers. 

Write  an  analysis  of  his  study  that  addresses  the  following  issues:  Is  his  sample  representa- 
tive of  the  population  of  all  college  textbooks?  Explain  why  or  why  not.  Describe  some  possible 
soiirces  of  bias  in  this  study,  and  how  it  might  affect  the  results  of  the  study.  Give  some  sugges- 
tions about  what  could  be  done  to  improve  the  study. 

**  Contributed  by  Roberta  Bloom 
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1.13  Lab  1:  Data  Collection 

Class  Time: 
Names: 


1.13.1  Student  Learning  Outcomes 

•  The  student  will  demonstrate  the  systematic  sampling  technique. 

•  The  student  will  construct  Relative  Frequency  Tables. 

•  The  student  wiU  interpret  resiilts  and  their  differences  from  different  data  groupings. 


1.13.2  Movie  Survey 

Ask  five  classmates  from  a  different  class  how  many  movies  they  saw  last  month  at  the  theater.  Do  not 
include  rented  movies. 


1.  Record  the  data 

2.  In  class,  randomly  pick  one  person.  On  the  class  list,  mark  that  person's  name.  Move  down  four 
people's  names  on  the  class  list.  Mark  that  person's  name.  Continue  doing  this  until  you  have  marked 
12  people's  names.  You  may  need  to  go  back  to  the  start  of  the  list.  For  each  marked  name  record 
below  the  five  data  values.  You  now  have  a  total  of  60  data  values. 

3.  For  each  name  marked,  record  the  data: 


Table  1.27 


1.13.3  Order  the  Data 

Complete  the  two  relative  frequency  tables  below  using  your  class  data. 
^*This  content  is  available  online  at  <http://caTx.org/content/ml6004/l.ll/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


51 


Frequency  of  Number  of  Movies  Viewed 


Number  of  Movies 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

0 

1 

2 

3 

4 

5 

6 

7+ 

Table  1.28 

Frequency  of  Number  of  Movies  Viewed 


Number  of  Movies 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

0-1 

2-3 

4-5 

6-7+ 

Table  1.29 


1.  Using  the  tables,  find  the  percent  of  data  that  is  at  most  2.  Which  table  did  you  use  and  why? 

2.  Using  the  tables,  find  the  percent  of  data  that  is  at  most  3.  Which  table  did  you  use  and  why? 

3.  Using  the  tables,  find  the  percent  of  data  that  is  more  than  2.  Which  table  did  you  use  and  why? 

4.  Using  the  tables,  find  the  percent  of  data  that  is  more  than  3.  Which  table  did  you  use  and  why? 

1.13.4  Discussion  Questions 

1.  Is  one  of  the  tables  above  "more  correct"  than  the  other?  Why  or  why  not? 

2.  In  general,  why  would  someone  group  the  data  in  different  ways?  Are  there  any  advantages  to  either 
way  of  grouping  the  data? 

3.  Why  did  you  switch  between  tables,  if  you  did,  when  answering  the  question  above? 
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1.14  Lab  2:  Sampling  Experiment 

Class  Time: 
Names: 

1.14.1  Student  Learning  Outcomes 

•  The  student  will  demonstrate  the  simple  random,  systematic,  stratified,  and  cluster  sampling  tech- 
niques. 

•  The  student  wiU  explain  each  of  the  details  of  each  procedure  used. 

In  this  lab,  you  will  be  asked  to  pick  several  random  samples.  In  each  case,  describe  your  procedure  briefly, 
including  how  you  might  have  used  the  random  number  generator,  and  then  list  the  restaurants  in  the 
sample  you  obtained 

NOTE:  The  following  section  contains  restaurants  stratified  by  city  into  columns  and  grouped 
horizontally  by  entree  cost  (clusters). 

1.14.2  A  Simple  Random  Sample 

Pick  a  simple  random  sample  of  15  restaurants. 


1.  Describe  the  procedure: 

2.  ,  ,  


1. 

6. 

11. 

2. 

7. 

12. 

3. 

8. 

13. 

4. 

9. 

14. 

5. 

10. 

15. 

Table  1.30 


1.14.3  A  Systematic  Sample 

Pick  a  systematic  sample  of  15  restaurants. 

1.  Describe  the  procedure: 


2. 


1. 

6. 

11. 

2. 

7. 

12. 

3. 

8. 

13. 

4. 

9. 

14. 

5. 

10. 

15. 

Table  1.31 


'This  content  is  available  online  at  <http: / /cnx.org/content/ml6013/1.15/>. 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


53 


1.14.4  A  Stratified  Sample 

Pick  a  stratified  sample,  by  city,  of  20  restaurants.  Use  25%  of  the  restaurants  from  each  stratiim.  Roimd  to 
the  nearest  whole  niraiber. 

1.  Describe  the  procedure: 


2. 


1. 

6. 

11. 

16. 

2. 

7. 

12. 

17. 

3. 

8. 

13. 

18. 

4. 

9. 

14. 

19. 

5. 

10. 

15. 

20. 

Table  1.32 


1.14.5  A  Stratified  Sample 

Pick  a  stratified  sample,  by  entree  cost,  of  21  restaurants.  Use  25%  of  the  restaurants  from  each  stratirai. 
Round  to  the  nearest  whole  number. 

1.  Describe  the  procedure: 


2. 


1. 

6. 

11. 

16. 

2. 

7. 

12. 

17. 

3. 

8. 

13. 

18. 

4. 

9. 

14. 

19. 

5. 

10. 

15. 

20. 

21. 

Table  1.33 


1.14.6  A  Cluster  Sample 

Pick  a  cluster  sample  of  restaurants  from  two  cities.  The  number  of  restaurants  will  vary. 
1.  Describe  the  procedure: 


2. 


1. 

6. 

11. 

16. 

21. 

2. 

7. 

12. 

17. 

22. 

3. 

8. 

13. 

18. 

23. 

4. 

9. 

14. 

19. 

24. 

5. 

10. 

15. 

20. 

25. 

Table  1.34 


1.14.7  Restaurants  Stratified  by  City  and  Entree  Cost 

Restaurants  Used  in  Sample 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


54 


CHAPTER  1.  SAMPLING  AND  DATA 


Entree  Cost  — > 

under  q>iu 

q>iu  to  under  q>i3 

to  under  q)zu 

over  q>zu 

San  Jose 

El  Abuelo  Taq, 
Pasta  Mia, 
Emma's  Express, 
Bamboo  Hut 

Emperor's  Guard, 
Creekside  Inn 

Agenda,  Gervais, 
Miro's 

Blake's,  Eulipia, 
Hayes  Mansion, 
Germania 

Palo  Alto 

Senor  Taco,  Olive 
Garden,  Taxi's 

Ming's,  PA.  Joe's, 
Stickney's 

Scott's  Seafood, 
Poolside  GrUl, 
risn  iviarKet 

Sundance  Mine, 

Maddalena's, 

Spago's 

Los  Gatos 

Mary's  Patio, 
Mount  Everest, 
Sweet  Pea's, 
Andele  Taqueria 

Lindsey's,  Willow 
Street 

ToU  House 

Charter  House,  La 
Maison  Du  Cafe 

Mountain  View 

Maharaja,  New 
Ma's,  Thai-Rific, 

Garden  Fresh 

Amber  Indian,  La 
Fiesta,  Fiesta  del 
Mar,  Dawit 

Austin's,  Shiva's, 

Mazeh 

Le  Petit  Bistro 

Cupertino 

Hobees,  Hung  Fu, 
Samrat,  Panda  Ex- 
press 

Santa  Barb.  Grill, 
Mand.  Gourmet, 
Bombay  Oven, 
Kathmandu  West 

Fontana's,  Blue 
Pheasant 

Hamasushi,  He- 
lios 

Sunnyvale 

Chekijababi,  Taj 
India,  Full  Throt- 
tle, Tia  Juana, 
Lemon  Grass 

Pacific  Fresh, 
Charley  Brown's, 
Cafe  Cameroon, 
Faz,  Aruba's 

Lion  &  Compass, 
The  Palace,  Beau 
Sejour 

Santa  Clara 

Rangoli,  Ar- 
madillo Willy's, 
Thai  Pepper, 
Pasand 

Arthur's,  Katie's 
Cafe,  Pedro's,  La 
Galleria 

Birk's,  Truya 
Sushi,  Valley 
Plaza 

Lakeside,  Mari- 
ani's 

Table  1.35 


NOTE:  The  original  lab  was  designed  and  contributed  by  Carol  Olmstead. 
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Solutions  to  Exercises  in  Chapter  1 

Solution  to  Example  1.5,  Problem  (p.  19) 

Items  1,  5, 11,  and  12  are  quantitative  discrete;  items  4, 6, 10,  and  14  are  quantitative  continuous;  and  items 
2, 3,  7,  8, 9,  and  13  are  qualitative. 
Solution  to  Example  1.10,  Problem  (p.  36) 

1.  29% 

2.  36% 

3.  77% 

4.  87 

5.  quantitative  continuous 

6.  get  rosters  from  each  team  and  choose  a  simple  random  sample  from  each 
Solution  to  Example  1.11,  Problem  (p.  37) 

1.  No.  Frequency  column  sums  to  18,  not  19.  Not  all  cumulative  relative  frequencies  are  correct. 

2.  False.  Frequency  for  3  miles  should  be  1;  for  2  miles  (left  out),  2.  Cumulative  relative  frequency 
column  should  read:  0.1052,  0.1579,  0.2105,  0.3684, 0.4737, 0.6316, 0.7368,  0.7895,  0.8421,  0.9474, 1. 

3  ^ 

4  ^  12  ^ 
t-  19/  19/  19 


Solutions  to  Homework 
Solution  to  Exercise  1.12.1  (p.  42) 

a.  quantitative  -  discrete 

b.  quantitative  -  continuous 

c.  qualitative 

d.  quantitative  -  continuous 

e.  quantitative  -  discrete 

f.  qualitative 

g.  qualitative 

h.  quantitative  -  continuous 

i.  quantitative  -  continuous 
j.  quantitative  -  discrete 

Solution  to  Exercise  1.12.3  (p.  42) 

a.  Cum.  Rel.  Freq.  for  0  is  0.4500 

Rel.  Freq.  for  1  is  0.3000  and  Cum.  Rel.  Freq.  for  1  or  less  is  0.7500 
Freq.  for  3  is  11  and  Rel.  Freq.  is  0.1833 
Cum.  Rel.  Freq.  for  6  or  less  is  0.9833 
Cum.  Rel.  Freq.  for  7  or  less  is  1 

b.  5.00% 

c.  93.33% 

Solution  to  Exercise  1.12.5  (p.  43) 

a.  Children  who  take  ski  or  snowboard  lessons 

b.  A  group  of  these  children 

c.  The  population  mean 

d.  The  sample  mean 

e.  X  =  the  age  of  one  child  who  takes  the  first  ski  or  snowboard  lesson 
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f.  Values  for  X,  such  as  3,  7,  etc. 
Solution  to  Exercise  1.12.7  (p.  43) 

a.  The  clients  of  the  insurance  companies 

b.  A  group  of  the  clients 

c.  The  mean  health  costs  of  the  clients 

d.  The  mean  health  costs  of  the  sample 

e.  X  =  the  health  costs  of  one  client 

f .  Values  for  X,  such  as  34, 9, 82,  etc. 

Solution  to  Exercise  1.12.9  (p.  44) 

a.  All  the  clients  of  the  counselor 

b.  A  group  of  the  clients 

c.  The  proportion  of  all  her  clients  who  stay  married 

d.  The  proportion  of  the  sample  who  stay  married 

e.  X  =  the  number  of  couples  who  stay  married 

f .  yes,  no 

Solution  to  Exercise  1.12.11  (p.  44) 

a.  All  people  (maybe  in  a  certain  geographic  area,  such  as  the  United  States) 

b.  A  group  of  the  people 

c.  The  proportion  of  all  people  who  will  buy  the  product 

d.  The  proportion  of  the  sample  who  will  buy  the  product 

e.  X  =  the  number  of  people  who  will  buy  it 

f .  buy,  not  buy 

Solution  to  Exercise  1.12.15  (p.  45) 

a:  4% 
b:  100 

Solution  to  Exercise  1.12.19  (p.  47) 

D 

Solution  to  Exercise  1.12.20  (p.  47) 

A 

Solution  to  Exercise  1.12.21  (p.  47) 

B 

Solution  to  Exercise  1.12.22  (p.  47) 

C 

Solution  to  Exercise  1.12.23  (p.  48) 

B 

Solution  to  Exercise  1.12.24  (p.  48) 

B 

Solution  to  Exercise  1.12.25  (p.  48) 

B 

Solution  to  Exercise  1.12.26  (p.  48) 

B 

Solution  to  Exercise  1.12.27  (p.  48) 

C 

Solution  to  Exercise  1.12.28  (p.  49) 

A.  Convenience 
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B.  Cluster 

C.  Stratified 

D.  Systematic 

E.  Simple  Random 

Solution  to  Exercise  1.12.29  (p.  49) 

The  answer  below  contains  some  of  the  issues  that  students  might  discuss  for  this  problem.  Individual 
student's  answers  may  also  identify  other  issues  that  pertain  to  this  problem  that  are  not  included  in  the 
answer  below. 

The  sample  is  not  representative  of  the  population  of  all  college  textbooks.  Two  reasons  why  it  is 
not  representative  are  that  he  only  sampled  7  subjects  and  he  only  investigated  one  textbook  in  each 
subject.  There  are  several  possible  sources  of  bias  in  the  study.  The  7  subjects  that  he  investigated  are 
all  in  mathematics  and  the  sciences;  there  are  many  subjects  in  the  humanities,  social  sciences,  and  many 
other  subject  areas,  (for  example:  literatiire,  art,  history,  psychology,  sociology,  business)  that  he  did  not 
investigate  at  all.  It  may  be  that  different  subject  areas  exhibit  different  patterns  of  textbook  availability, 
but  his  sample  woiild  not  detect  such  resiilts. 

He  also  only  looked  at  the  most  popular  textbook  in  each  of  the  subjects  he  investigated.  The  avail- 
ability of  the  most  popiilar  textbooks  may  differ  from  the  availability  of  other  textbooks  in  one  of  two 
ways: 

•  the  most  popular  textbooks  may  be  more  readily  available  online,  because  more  new  copies  are 
printed  and  more  students  nationwide  selling  back  their  used  copies  OR 

•  the  most  popular  textbooks  may  be  harder  to  find  available  online,  because  more  student  demand 
exhausts  the  supply  more  quickly. 

In  reality,  many  college  students  do  not  use  the  most  popular  textbook  in  their  subject,  and  this  study  gives 
no  useful  information  about  the  situation  for  those  less  popular  textbooks. 

He  could  improve  this  study  by 

•  expanding  the  selection  of  subjects  he  investigates  so  that  it  is  more  representative  of  aU  subjects 
studied  by  college  students  and 

•  expanding  the  selection  of  textbooks  he  investigates  within  each  subject  to  include  a  mixed  represen- 
tation of  both  the  popiilar  and  less  popiilar  textbooks. 
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Chapter  2 

Descriptive  Statistics 


2.1  Descriptive  Statistics^ 

2.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Display  data  graphically  and  interpret  graphs:  stemplots,  histograms  and  boxplots. 

•  Recognize,  describe,  and  calculate  the  measures  of  location  of  data:  quartiles  and  percentiles. 

•  Recognize,  describe,  and  calculate  the  measures  of  the  center  of  data:  mean,  median,  and  mode. 

•  Recognize,  describe,  and  calculate  the  measures  of  the  spread  of  data:  variance,  standard  deviation, 
and  range. 


2.1.2  Introduction 

Once  you  have  collected  data,  what  will  you  do  with  it?  Data  can  be  described  and  presented  in  many 
different  formats.  For  example,  suppose  you  are  interested  in  buying  a  house  in  a  particular  area.  You  may 
have  no  clue  about  the  house  prices,  so  you  might  ask  your  real  estate  agent  to  give  you  a  sample  data  set 
of  prices.  Looking  at  all  the  prices  in  the  sample  often  is  overwhelming.  A  better  way  might  be  to  look 
at  the  median  price  and  the  variation  of  prices.  The  median  and  variation  are  just  two  ways  that  you  will 
learn  to  describe  data.  Your  agent  might  also  provide  you  with  a  graph  of  the  data. 

In  this  chapter,  you  will  study  numerical  and  graphical  ways  to  describe  and  display  your  data.  This  area 
of  statistics  is  called  "Descriptive  Statistics".  You  will  learn  to  calculate,  and  even  more  importantly,  to 
interpret  these  measurements  and  graphs. 

2.2  Displaying  Data' 

A  statistical  graph  is  a  tool  that  helps  you  learn  about  the  shape  or  distribution  of  a  sample.  The  graph  can 
be  a  more  effective  way  of  presenting  data  than  a  mass  of  numbers  because  we  can  see  where  data  clusters 
and  where  there  are  only  a  few  data  values.  Newspapers  and  the  Internet  use  graphs  to  show  trends  and 
to  enable  readers  to  compare  facts  and  figures  quickly. 

Statisticians  often  graph  data  first  to  get  a  picture  of  the  data.  Then,  more  formal  tools  may  be  applied. 

^This  content  is  available  onKne  at  <http:/ / cnx.org/content/ ml6300/1.9/ >. 
^This  content  is  available  online  at  <http://cnx.org/content/ml6297/1.9/>. 
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Some  of  the  types  of  graphs  that  are  used  to  summarize  and  organize  data  are  the  dot  plot,  the  bar  chart, 
the  histogram,  the  stem-and-leaf  plot,  the  frequency  polygon  (a  t5^e  of  broken  line  graph),  pie  charts,  and 
the  boxplot.  In  this  chapter,  we  will  briefly  look  at  stem-and-leaf  plots,  line  graphs  and  bar  graphs.  Our 
emphasis  will  be  on  histograms  and  boxplots. 

2.3  Stem  and  Leaf  Graphs  (Stemplots),  Line  Graphs  and  Bar  Graphs^ 

One  simple  graph,  the  stem-and-leaf  graph  or  stem  plot,  comes  from  the  field  of  exploratory  data  analy- 
sis.It  is  a  good  choice  when  the  data  sets  are  small.  To  create  the  plot,  divide  each  observation  of  data  into 
a  stem  and  a  leaf.  The  leaf  consists  of  a  final  significant  digit.  For  example,  23  has  stem  2  and  leaf  3.  Four 
hundred  thirty-two  (432)  has  stem  43  and  leaf  2.  Five  thousand  four  hundred  thirty-two  (5,432)  has  stem 
543  and  leaf  2.  The  decimal  9.3  has  stem  9  and  leaf  3.  Write  the  stems  in  a  vertical  line  from  smallest  the 
largest.  Draw  a  vertical  line  to  the  right  of  the  stems.  Then  write  the  leaves  in  increasing  order  next  to  their 
corresponding  stem. 

Example  2.1 

For  Susan  Dean's  spring  pre-calculus  class,  scores  for  the  first  exam  were  as  follows  (smallest  to 
largest): 

33;  42;  49;  49;  53;  55;  55;  61;  63;  67;  68;  68;  69;  69;  72;  73;  74;  78;  80;  83;  88;  88;  88;  90;  92;  94;  94;  94;  94; 
96;  100 

Stem-and-Leaf  Diagram 


Stem 

Leaf 

3 

3 

4 

299 

5 

355 

6 

1378899 

7 

2348 

8 

03888 

9 

0244446 

10 

0 

Table  2.1 


The  stem  plot  shows  that  most  scores  fell  in  the  60s,  70s,  80s,  and  90s.  Eight  out  of  the  31  scores  or 
approximately  26%  of  the  scores  were  in  the  90's  or  100,  a  fairly  high  number  of  As. 

The  stem  plot  is  a  qmck  way  to  graph  and  gives  an  exact  picture  of  the  data.  You  want  to  look  for  an  overall 
pattern  and  any  outliers.  An  outlier  is  an  observation  of  data  that  does  not  fit  the  rest  of  the  data.  It  is 
sometimes  called  an  extreme  value.  When  you  graph  an  outlier,  it  will  appear  not  to  fit  the  pattern  of  the 
graph.  Some  outliers  are  due  to  mistakes  (for  example,  writing  down  50  instead  of  500)  while  others  may 
indicate  that  something  unusual  is  happening.  It  takes  some  backgroimd  information  to  explain  outliers. 
In  the  example  above,  there  were  no  outliers. 

Example  2.2 

Create  a  stem  plot  using  the  data: 
^This  content  is  available  online  at  <http://cnx.org/content/ml6849/1.17/>. 
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1.1;  1.5;  2.3;  2.5;  2.7;  3.2;  3.3;  3.3;  3.5;  3.8;  4.0;  4.2;  4.5;  4.5;  4.7;  4.8;  5.5;  5.6;  6.5;  6.7;  12.3 

The  data  are  the  distance  (in  kilometers)  from  a  home  to  the  nearest  supermarket. 

Problem  (Solution  on  p.  114.) 

1.  Are  there  any  values  that  might  possibly  be  outliers? 

2.  Do  the  data  seem  to  have  any  concentration  of  values? 

Hint:  The  leaves  are  to  the  right  of  the  decimal. 


Another  type  of  graph  that  is  useful  for  specific  data  values  is  a  line  graph.  In  the  particular  line  graph 
shown  in  the  example,  the  x-axis  consists  of  data  values  and  the  y-axis  consists  of  frequency  points.  The 
frequency  points  are  connected. 

Example  2.3 

In  a  survey,  40  mothers  were  asked  how  many  times  per  week  a  teenager  must  be  reminded  to  do 
his/her  chores.  The  results  are  shown  in  the  table  and  the  line  graph. 


Number  of  times  teenager  is  reminded 

Frequency 

0 

2 

1 

5 

2 

8 

3 

14 

4 

7 

5 

4 

Table  2.2 

D  1  2  3  4  5  6 

Number  of  Times  Teenager  is 
Reminded 


Bar  graphs  consist  of  bars  that  are  separated  from  each  other.  The  bars  can  be  rectangles  or  they  can  be 
rectangular  boxes  and  they  can  be  vertical  or  horizontal. 

The  bar  graph  shown  in  Example  4  has  age  groups  represented  on  the  x-axis  and  proportions  on  the  y-axis. 
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Example  2.4 

By  the  end  of  2011,  in  the  United  States,  Facebook  had  over  146  million  users.  The  table 
shows  three  age  groups,  the  number  of  users  in  each  age  group  and  the  proportion  (%)  of 
users  in  each  age  group.  Source:  http://www.kenburbary.com/2011/03/facebook-demographics- 
revisited-201 1-s  ta  tistics-2/ 


Age  groups 

Number  of  Facebook  users 

Proportion  (%)  of  Facebook  users 

13-25 

65,082,280 

45% 

26-44 

53,300,200 

36% 

45-64 

27,885,100 

19% 

Table  2.3 

Ages  Ages  Ages 

13-25  16r44  45-64 


Example  2.5 

The  columns  in  the  table  below  contain  the  race/ethnicity  of  U.S.  Public  Schools:  High  School 
Class  of  2011,  percentages  for  the  Advanced  Placement  Examinee  Population  for  that  class 
and  percentages  for  the  Overall  Student  Population.  The  3-dimensional  graph  shows  the 
Race /Ethnicity  of  U.S.  Public  Schools  (qualitative  data)  on  the  x-axis  and  Advanced  Placement 
Examinee  Population  percentages  on  the  y-axis.  (Source:  http://www.collegeboard.com  and 
Source:  http://apreport.collegeboard.org/goals-and-findings/promoting-equity) 


Race/Ethnicity 

AP  Examinee  Population 

Overall  Student  Population 

1  =  Asian,  Asian  American  or  Pa- 
cific Islander 

10.3% 

5.7% 

continued  on  next  page 

50 

Ages  45 
40 
35 
30 

Proportior  |%)  25 
20 
15 
10 
S 
0 
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2  =  Black  or  African  American 

9.0% 

14.7% 

3  =  Hispanic  or  Latino 

17.0% 

17.6% 

4  =  American  Indian  or  Alaska 
Native 

0.6% 

1.1% 

5  =  White 

57.1% 

59.2% 

6  =  Not  reported/ other 

6.0% 

1.7% 

Table  2.4 

Ethniclty/Race  VS.  Percent  of  AP 
Examinees 


1  2  3  4  5  S 


Go  to  Outcomes  of  Education  Figure  22^  for  an  example  of  a  bar  graph  that  shows  unemployment  rates  of 
persons  25  years  and  older  for  2009. 

NOTE:  This  book  contains  instructions  for  constructing  a  histogram  and  a  box  plot  for  the  TI-83+ 
and  TI-84  calculators.  You  can  find  additional  instructions  for  using  these  calculators  on  the  Texas 
Instruments  (TI)  website^  . 


2.4  Histograms^ 

For  most  of  the  work  you  do  in  this  book,  you  will  use  a  histogram  to  display  the  data.  One  advantage  of  a 
histogram  is  that  it  can  readily  display  large  data  sets.  A  rule  of  thumb  is  to  use  a  histogram  when  the  data 
set  consists  of  100  values  or  more. 

A  histogram  consists  of  contiguous  boxes.  It  has  both  a  horizontal  axis  and  a  vertical  axis.  The  horizontal 
axis  is  labeled  with  what  the  data  represents  (for  instance,  distance  from  your  home  to  school).  The  vertical 
axis  is  labeled  either  Frequency  or  relative  frequency.  The  graph  will  have  the  same  shape  with  either 
label.  The  histogram  (like  the  stemplot)  can  give  you  the  shape  of  the  data,  the  center,  and  the  spread  of  the 
data.  (The  next  section  tells  you  how  to  calculate  the  center  and  the  spread.) 


*http://nces.ed.gov/pubs2011/2011015_5.pdf 

^http://education.ti.com/ educationportal/ sites/US/ sectionHome/ support.html 
^This  content  is  available  online  at  <http://cnx.Org/content/ml6298/l. 14/>. 
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The  relative  frequency  is  equal  to  the  frequency  for  an  observed  value  of  the  data  divided  by  the  total 
number  of  data  values  in  the  sample.  (In  the  chapter  on  Sampling  and  Data  (Section  1.1),  we  defined 
frequency  as  the  number  of  times  an  answer  occurs.)  If: 

•  f  -  frequency 

•  n  =  total  number  of  data  values  (or  the  sum  of  the  individual  frequencies),  and 

•  RF  =  relative  frequency, 

then: 

RF  =  ^  (2.1) 

n 

For  example,  if  3  students  in  Mr.  Ahab's  English  class  of  40  students  received  from  90%  to  100%,  then, 
/  =  3  ,  n     40 ,  and  RF  =  {  =  ^  =  0.075 

Seven  and  a  half  percent  of  the  students  received  90%  to  100%.  Ninety  percent  to  100  %  are  quantitative 
measures. 

To  construct  a  histogram,  first  decide  how  many  bars  or  intervals,  also  called  classes,  represent  the  data. 
Many  histograms  consist  of  from  5  to  15  bars  or  classes  for  clarity.  Choose  a  starting  point  for  the  first 
interval  to  be  less  than  the  smallest  data  value.  A  convenient  starting  point  is  a  lower  value  carried  out 
to  one  more  decimal  place  than  the  value  with  the  most  decimal  places.  For  example,  if  the  value  with  the 
most  decimal  places  is  6.1  an.d  this  is  the  smallest  value,  a  convenient  starting  point  is  6.05  (6.1  -  0.05  =  6.05). 
We  say  that  6.05  has  more  precision.  If  the  value  with  the  most  decimal  places  is  2.23  and  the  lowest  value 
is  1.5,  a  convenient  starting  point  is  1.495  (1.5  -  0.005  =  1.495).  If  the  value  with  the  most  decimal  places  is 
3.234  and  the  lowest  value  is  1.0,  a  convenient  starting  point  is  0.9995  (1.0  -  .0005  =  0.9995).  If  all  the  data 
happen  to  be  integers  and  the  smallest  value  is  2,  then  a  convenient  starting  point  is  1.5  (2  -  0.5  =  1.5).  Also, 
when  the  starting  point  and  other  boimdaries  are  carried  to  one  additional  decimal  place,  no  data  value 
will  faU  on  a  boundary. 

Example  2.6 

The  following  data  are  the  heights  (in  inches  to  the  nearest  half  inch)  of  100  male  semiprofessional 
soccer  players.  The  heights  are  continuous  data  since  height  is  measured. 

60;  60.5;  61;  61;  61.5 

63.5;  63.5;  63.5 

64;  64;  64;  64;  64;  64;  64;  64.5;  64.5;  64.5;  64.5;  64.5;  64.5;  64.5;  64.5 

66;  66;  66;  66;  66;  66;  66;  66;  66;  66;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  66.5;  67;  67; 
67;  67;  67;  67;  67;  67;  67;  67;  67;  67;  67.5;  67.5;  67.5;  67.5;  67.5;  67.5;  67.5 

68;  68;  69;  69;  69;  69;  69;  69;  69;  69;  69;  69;  69.5;  69.5;  69.5;  69.5;  69.5 

70;  70;  70;  70;  70;  70;  70.5;  70.5;  70.5;  71;  71;  71 

72;  72;  72;  72.5;  72.5;  73;  73.5 

74 

The  smallest  data  value  is  60.  Since  the  data  with  the  most  decimal  places  has  one  decimal  (for 
instance,  61.5),  we  want  ouj  starting  point  to  have  two  decimal  places.  Since  the  numbers  0.5, 
0.05,  0.005,  etc.  are  convenient  numbers,  use  0.05  and  subtract  it  from  60,  the  smallest  value,  for 
the  convenient  starting  point. 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


60  -  0.05  =  59.95  which  is  more  precise  than,  say,  61.5  by  one  decimal  place.  The  starting  point  is, 
then,  59.95. 

The  largest  value  is  74.  74+  0.05  =  74.05  is  the  ending  value. 

Next,  calculate  the  width  of  each  bar  or  class  interval.  To  calculate  this  width,  subtract  the  starting 
point  from  the  ending  value  and  divide  by  the  niraiber  of  bars  (you  must  choose  the  number  of 
bars  you  desire).  Suppose  you  choose  8  bars. 


NOTE:  We  will  round  up  to  2  and  make  each  bar  or  class  interval  2  units  wide.  Rounding  up  to  2  is 
one  way  to  prevent  a  value  from  falling  on  a  boimdary.  Rounding  to  the  next  niraiber  is  necessary 
even  if  it  goes  against  the  standard  rules  of  rounding.  For  this  example,  using  1.76  as  the  width 
would  also  work. 

The  boundaries  are: 


•  59.95 


59.95 

+  2 

=  61 

95 

61.95 

+  2 

=  63 

95 

63.95 

+  2 

=  65 

95 

65.95 

+  2 

=  67 

95 

67.95 

+  2 

=  69 

95 

69.95 

+  2 

=  71 

95 

71.95 

+  2 

=  73 

95 

73.95 

+  2 

=  75 

95 

The  heights  60  through  61.5  inches  are  in  the  interval  59.95  -  61.95.  The  heights  that  are  63.5  are 
in  the  interval  61.95  -  63.95.  The  heights  that  are  64  through  64.5  are  in  the  interval  63.95  -  65.95. 
The  heights  66  through  67.5  are  in  the  interval  65.95  -  67.95.  The  heights  68  through  69.5  are  in  the 
interval  67.95  -  69.95.  The  heights  70  through  71  are  in  the  interval  69.95  -  71.95.  The  heights  72 
through  73.5  are  in  the  interval  71.95  -  73.95.  The  height  74  is  in  the  interval  73.95  -  75.95. 

The  following  histogram  displays  the  heights  on  the  x-axis  and  relative  frequency  on  the  y-axis. 
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Example  2.7 

The  following  data  are  the  number  of  books  bought  by  50  part-time  college  students  at  ABC 
College.  The  number  of  books  is  discrete  data  since  books  are  counted. 

1;  1;  1;  1;  1;  1;  1;  1;  1;  1;  1 

2;  2;  2;  2;  2;  2;  2;  2;  2;  2 

3/  3/  3/  3/  3/  3/  3/  3^  3^  3^  3^  3^  3^  3^  3^  3 
4;  4;  4;  4;  4;  4 
/  ^/  ^/  ^/  ^ 
6;  6 

Eleven  students  buy  1  book.  Ten  students  buy  2  books.  Sixteen  students  buy  3  books.  Six  students 
buy  4  books.  Five  students  buy  5  books.  Two  students  buy  6  books. 

Because  the  data  are  integers,  subtract  0.5  from  1,  the  smallest  data  value  and  add  0.5  to  6,  the 
largest  data  value.  Then  the  starting  point  is  0.5  and  the  ending  value  is  6.5. 

Problem  (Solution  on  p.  114.) 

Next,  calculate  the  width  of  each  bar  or  class  interval.  If  the  data  are  discrete  and  there  are  not  too 
many  different  values,  a  width  that  places  the  data  values  in  the  middle  of  the  bar  or  class  interval 
is  the  most  convenient.  Since  the  data  consist  of  the  numbers  1, 2, 3, 4, 5, 6  and  the  starting  point  is 
0.5,  a  width  of  one  places  the  1  in  the  middle  of  the  interval  from  0.5  to  1.5,  the  2  in  the  middle  of 
the  interval  from  1.5  to  2.5,  the  3  in  the  middle  of  the  interval  from  2.5  to  3.5,  the  4  in  the  middle  of 

the  interval  from  to  ,  the  5  in  the  middle  of  the  interval  from  to  , 

and  the  in  the  middle  of  the  interval  from  to  . 
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Calculate  the  number  of  bars  as  follows: 


6.5  -  0.5 

bars 

where  1  is  the  width  of  a  bar.  Therefore,  bars  =  6. 


(2.3) 


The  following  histogram  displays  the  number  of  books  on  the  x-axis  and  the  frequency  on  the 
y-axis. 
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Using  the  TI-83,  83+,  84,  84+  Calculator  Instructions 

Go  to  the  Appendix  (14:Appendix)  in  the  menu  on  the  left.  There  are  calculator  instructions  for  entering 
data  and  for  creating  a  customized  histogram.  Create  the  histogram  for  Example  2. 

•  Press  Y=.  Press  CLEAR  to  clear  out  any  equations. 

•  Press  STAT  1:EDIT.  If  LI  has  data  in  it,  arrow  up  into  the  name  LI,  press  CLEAR  and  arrow  down.  If 
necessary,  do  the  same  for  L2. 

•  Into  LI,  enter  1,  2,  3, 4,  5,  6 

•  Into  L2,  enter  11, 10, 16,  6, 5,  2 

•  Press  WINDOW.  Make  Xmin  =  .5,  Xmax  =  6.5,  Xscl  =  (6.5  -  .5)/6,  Ymin  =  -1,  Ymax  =  20,  Yscl  =  1,  Xres 
=  1 

•  Press  2nd  Y=.  Start  by  pressing  4:Plotsoff  ENTER. 

•  Press  2nd  Y=.  Press  l:Plotl.  Press  ENTER.  Arrow  down  to  TYPE.  Arrow  to  the  3rd  picture  (his- 
togram). Press  ENTER. 

•  Arrow  down  to  Xlist:  Enter  LI  (2nd  1).  Arrow  down  to  Freq.  Enter  L2  (2nd  2). 

•  Press  GRAPH 

•  Use  the  TRACE  key  and  the  arrow  keys  to  examine  the  histogram. 


2.4.1  Optional  Collaborative  Exercise 

Count  the  money  (bills  and  change)  in  your  pocket  or  purse.  Your  instructor  will  record  the  amounts.  As  a 
class,  construct  a  histogram  displaying  the  data.  Discuss  how  many  intervals  you  think  is  appropriate.  You 
may  want  to  experiment  with  the  number  of  intervals.  Discuss,  also,  the  shape  of  the  histogram. 

Record  the  data,  in  dollars  (for  example,  1.25  dollars). 

Construct  a  histogram. 
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2.5  Box  Plots' 

Box  plots  or  box-whisker  plots  give  a  good  graphical  image  of  the  concentration  of  the  data.  They  also 
show  how  far  from  most  of  the  data  the  extreme  values  are.  The  box  plot  is  constructed  from  five  values: 
the  smallest  value,  the  first  quartile,  the  median,  the  third  quartile,  and  the  largest  value.  The  median,  the 
first  quartile,  and  the  third  quartile  will  be  discussed  here,  and  then  again  in  the  section  on  measuring  data 
in  this  chapter.  We  use  these  values  to  compare  how  close  other  data  values  are  to  them. 

The  median,  a  number,  is  a  way  of  measuring  the  "center"  of  the  data.  You  can  think  of  the  median  as  the 
"middle  value,"  although  it  does  not  actually  have  to  be  one  of  the  observed  values.  It  is  a  number  that 
separates  ordered  data  into  halves.  Half  the  values  are  the  same  number  or  smaller  than  the  median  and 
half  the  values  are  the  same  number  or  larger.  For  example,  consider  the  following  data: 

1;  11.5;  6;  7.2;  4;  8;  9;  10;  6.8;  8.3;  2;  2;  10;  1 

Ordered  from  smallest  to  largest: 

1;  1;  2;  2;  4;  6;  6.8;  7.2;  8;  8.3;  9;  10;  10;  11.5 

The  median  is  between  the  7th  value,  6.8,  and  the  8th  value  7.2.  To  find  the  median,  add  the  two  values 
together  and  divide  by  2. 

,2.4, 

The  median  is  7.  Half  of  the  values  are  smaller  than  7  and  half  of  the  values  are  larger  than  7. 

Quartiles  are  numbers  that  separate  the  data  into  quarters.  Quartiles  may  or  may  not  be  part  of  the  data. 
To  find  the  quartiles,  first  find  the  median  or  second  quartile.  The  first  quartile  is  the  middle  value  of  the 
lower  half  of  the  data  and  the  third  quartile  is  the  middle  value  of  the  upper  half  of  the  data.  To  get  the 
idea,  consider  the  same  data  set  shown  above: 

1;  1;  2;  2;  4;  6;  6.8;  7.2;  8;  8.3;  9;  10;  10;  11.5 

The  median  or  second  quartile  is  7.  The  lower  half  of  the  data  is  1, 1, 2, 2, 4,  6,  6.8.  The  middle  value  of  the 
lower  half  is  2. 

1;  1;  2;  2;  4;  6;  6.8 

The  number  2,  which  is  part  of  the  data,  is  the  jSrst  quartile.  One-foiirth  of  the  values  are  the  same  or  less 
than  2  and  three-fourths  of  the  values  are  more  than  2. 

The  upper  half  of  the  data  is  7.2, 8,  8.3, 9, 10, 10, 11.5.  The  middle  value  of  the  upper  half  is  9. 
7.2;  8;  8.3;  9;  10;  10;  11.5 

The  number  9,  which  is  part  of  the  data,  is  the  third  quartile.  Three-fourths  of  the  values  are  less  than  9 
and  one-fourth  of  the  values  are  more  than  9. 

To  construct  a  box  plot,  use  a  horizontal  number  line  and  a  rectangular  box.  The  smallest  and  largest  data 
values  label  the  endpoints  of  the  axis.  The  first  quartile  marks  one  end  of  the  box  and  the  third  quartile 
marks  the  other  end  of  the  box.  The  middle  fifty  percent  of  the  data  fall  inside  the  box.  The  "whiskers" 
extend  from  the  ends  of  the  box  to  the  smallest  and  largest  data  values.  The  box  plot  gives  a  good  quick 
picture  of  the  data. 


''This  content  is  available  online  at  <http://cnx.org/content/ml6296/1.13/>. 
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NOTE:  You  may  encounter  box  and  whisker  plots  that  have  dots  marking  outlier  values.  In  those 
cases,  the  whiskers  are  not  extending  to  the  mrnimum  and  maximum  values. 

Consider  the  following  data: 

1;  1;  2;  2;  4;  6;  6.8  ;  7.2;  8;  8.3;  9;  10;  10;  11.5 

The  first  quartile  is  2,  the  median  is  7,  and  the  third  quartile  is  9.  The  smallest  value  is  1  and  the  largest 
value  is  11.5.  The  box  plot  is  constructed  as  follows  (see  calculator  instructions  in  the  back  of  this  book  or 
on  the  TI  web  site^  ): 


1       2345       6789     10     11  11.5 


The  two  whiskers  extend  from  the  first  quartile  to  the  smallest  value  and  from  the  third  quartile  to  the 
largest  value.  The  median  is  shown  with  a  dashed  line. 

Example  2.8 

The  following  data  are  the  heights  of  40  students  in  a  statistics  class. 

59;  60;  61;  62;  62;  63;  63;  64;  64;  64;  65;  65;  65;  65;  65;  65;  65;  65;  65;  66;  66;  67;  67;  68;  68;  69;  70;  70;  70; 
70;  70;  71;  71;  72;  72;  73;  74;  74;  75;  77 

Construct  a  box  plot: 

Using  the  TI-83,  83+,  84,  84+  Calculator 

•  Enter  data  into  the  list  editor  (Press  STAT  1:EDIT) 
the  name  LI,  press  CLEAR,  arrow  down. 

•  Put  the  data  values  in  list  LI. 

•  Press  STAT  and  arrow  to  CALC.  Press  l:l-VarStats 

•  Press  ENTER 

•  Use  the  down  and  up  arrow  keys  to  scroll. 


.  If  you  need  to  clear  the  list,  arrow  up  to 
.  Enter  LI. 


•  Smallest  value  =  59 

•  Largest  value  =  77 

•  Ql:  First  quartile  =  64.5 

•  Q2:  Second  quartile  or  median=  66 

•  Q3:  Third  quartile  =  70 

Using  the  TI-83,  83+,  84,  84+  to  Construct  the  Box  Plot 

Go  to  14:  Appendix  for  Notes  for  the  TI-83,  83+,  84, 84+  Calculator.  To  create  the  box  plot: 

•  Press  Y=.  If  there  are  any  equations,  press  CLEAR  to  clear  them. 

•  Press  2nd  Y=. 

•  Press  4:Plotsoff .  Press  ENTER 

^http:/ /education.ti.com/educationportal/sites/US/sectionHome/support.html 
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•  Press  2nd  Y= 

•  Press  l:Plotl.  Press  ENTER. 

•  Arrow  down  and  then  use  the  right  arrow  key  to  go  to  the  5th  picture  which  is  the  box  plot. 
Press  ENTER. 

•  Arrow  down  to  Xlist:  Press  2nd  1  for  LI 

•  Arrow  down  to  Freq:  Press  ALPHA.  Press  1. 

•  Press  ZOOM.  Press  9:ZoomStat. 

•  Press  TRACE  and  use  the  arrow  keys  to  examine  the  box  plot. 


a.  Each  quarter  has  25%  of  the  data. 

b.  The  spreads  of  the  four  quarters  are  64.5  -  59  =  5.5  (first  quarter),  66  -  64.5  =  1.5  (second  quarter), 

70  -  66  =  4  (3rd  quarter),  and  77  -  70  =  7  (fourth  quarter).  So,  the  second  quarter  has  the 
smallest  spread  and  the  fourth  quarter  has  the  largest  spread. 

c.  Interquartile  Range:  IQR  =  Q3  -  Ql  =  70  -  64.5  =  5.5. 

d.  The  interval  59  through  65  has  more  than  25%  of  the  data  so  it  has  more  data  in  it  than  the 

interval  66  through  70  which  has  25%  of  the  data. 

e.  The  middle  50%  (middle  half)  of  the  data  has  a  range  of  5.5  inches. 

For  some  sets  of  data,  some  of  the  largest  value,  smallest  value,  first  quartile,  median,  and  third 
quartile  may  be  the  same.  For  instance,  you  might  have  a  data  set  in  which  the  median  and  the 
third  quartile  are  the  same.  In  this  case,  the  diagram  would  not  have  a  dotted  line  inside  the  box 
displaying  the  median.  The  right  side  of  the  box  would  display  both  the  third  quartile  and  the 
median.  For  example,  if  the  smallest  value  and  the  first  quartile  were  both  1,  the  median  and  the 
third  quartile  were  both  5,  and  the  largest  value  was  7,  the  box  plot  would  look  as  follows: 
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Example  2.9 

Test  scores  for  a  college  statistics  class  held  during  the  day  are: 


99;  56;  78;  55.5;  32;  90;  80;  81;  56;  59;  45;  77;  84.5;  84;  70;  72;  68;  32;  79;  90 


Test  scores  for  a  college  statistics  class  held  during  the  evening  are: 


98;  78;  68;  83;  81;  89;  88;  76;  65;  45;  98;  90;  80;  84.5;  85;  79;  78;  98;  90;  79;  81;  25.5 
Problem 


(Solution  on  p.  114.) 
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•  What  are  the  smallest  and  largest  data  values  for  each  data  set? 

•  What  is  the  median,  the  first  quartile,  and  the  third  quartile  for  each  data  set? 

•  Create  a  boxplot  for  each  set  of  data. 

•  Which  boxplot  has  the  widest  spread  for  the  middle  50%  of  the  data  (the  data  between  the 
first  and  third  quartiles)?  What  does  this  mean  for  that  set  of  data  in  comparison  to  the  other 

set  of  data? 

•  For  each  data  set,  what  percent  of  the  data  is  between  the  smallest  value  and  the  first  quar- 
tile? (Answer:  25%)  the  first  quartile  and  the  median?  (Answer:  25%)  the  median  and  the 
third  quartile?  the  third  quartile  and  the  largest  value?  What  percent  of  the  data  is  between 
the  first  quartile  and  the  largest  value?  (Answer:  75%) 

The  first  data  set  (the  top  box  plot)  has  the  widest  spread  for  the  middle  50%  of  the  data.  IQR  = 
Q3  -  Ql  is  82.5  -  56  =  26.5  for  the  first  data  set  and  89  -  78  =  11  for  the  second  data  set. 
So,  the  first  set  of  data  has  its  middle  50%  of  scores  more  spread  out. 

25%  of  the  data  is  between  M  and  Q3  and  25%  is  between  Q3  and  Xmax. 


2.6  Measures  of  the  Location  of  the  Data^ 

The  common  measures  of  location  are  quartiles  and  percentiles  (%iles).  Quartiles  are  special  percentiles. 
The  first  quartile,  Qi  is  the  same  as  the  25th  percentile  (25th  %ile)  and  the  third  quartile,  Q3,  is  the  same  as 
the  75th  percentile  (75th  %ile).  The  median,  M,  is  called  both  the  second  quartile  and  the  50th  percentile 
(50th  %ile). 

NOTE:  Quartiles  are  given  special  attention  in  the  Box  Plots  module  in  this  chapter. 

To  calculate  quartiles  and  percentiles,  the  data  must  be  ordered  from  smallest  to  largest.  Recall  that 
quartiles  divide  ordered  data  into  quarters.  Percentiles  divide  ordered  data  into  hundredths.  To  score  in 
the  90th  percentile  of  an  exam  does  not  mean,  necessarily,  that  you  received  90%  on  a  test.  It  means  that 
90%  of  test  scores  are  the  same  or  less  than  your  score  and  10%  of  the  test  scores  are  the  same  or  greater 
than  your  test  score. 

Percentiles  are  useful  for  comparing  values.  For  this  reason,  universities  and  colleges  use  percentiles 
extensively. 

Percentiles  are  mostly  used  with  very  large  populations.  Therefore,  if  you  were  to  say  that  90%  of 
the  test  scores  are  less  (and  not  the  same  or  less)  than  your  score,  it  would  be  acceptable  because  removing 
one  particiilar  data  value  is  not  significant. 

The  interquartile  range  is  a  niraiber  that  indicates  the  spread  of  the  middle  half  or  the  middle  50%  of  the 
data.  It  is  the  difference  between  the  third  quartile  (Q3)  and  the  first  quartile  (Qi). 

IQR  =  Q3  -  Ql  (2.5) 

The  IQR  can  help  to  determine  potential  outliers.  A  value  is  suspected  to  be  a  potential  outlier  if  it  is 
less  than  (1.5)  {IQR)  below  the  first  quartile  or  more  than  (1.5)  {IQR)  above  the  third  quartile.  Potential 
outliers  always  need  further  investigation. 


'This  content  is  available  online  at  <http://cnx.org/content/ml6314/1.18/>. 
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Example  2.10 

For  the  following  13  real  estate  prices,  calculate  the  IQR  and  determine  if  any  prices  are  outliers. 
Prices  are  in  dollars.  {Source:  San  Jose  Merctiry  News) 

389,950;  230,500;  158,000;  479,000;  639,000;  114,950;  5,500,000;  387,000;  659,000;  529,000;  575,000; 
488,800;  1,095,000 

Solution 

Order  the  data  from  smallest  to  largest. 

114,950;  158,000;  230,500;  387,000;  389,950;  479,000;  488,800;  529,000;  575,000;  639,000;  659,000; 
1,095,000;  5,500,000 

M  =  488, 800 

Ql  =  230500+387000  ^  303750 
Q3  =  639000+659000  ^  549000 

IQR  =  649000  -  308750  =  340250 

(1.5)  (IQR)  =  (1.5)  (340250)  =  510375 

Ql  -  (1.5)  (IQR)  =  308750  -  510375  =  -201625 

Q3  +  (1.5)  {IQR)  =  649000  +  510375  =  1159375 

No  house  price  is  less  than  -201625.  However,  5,500,000  is  more  than  1,159,375.  Therefore, 
5,500,000  is  a  potential  outlier. 


Example  2.11 

For  the  two  data  sets  in  the  test  scores  example  (p.  70),  find  the  following: 

a.  The  interquartile  range.  Compare  the  two  interquartile  ranges. 

b.  Any  outliers  in  either  set. 

c.  The  30th  percentile  and  the  80th  percentile  for  each  set.  How  much  data  falls  below  the  30th 

percentile?  Above  the  80th  percentile? 


Example  2.12:  Finding  Quartiles  and  Percentiles  Using  a  Table 

Fifty  statistics  students  were  asked  how  much  sleep  they  get  per  school  night  (rounded  to  the 
nearest  hoiir).  The  resiilts  were  (student  data): 


AMOUNT  OF  SLEEP 
PER  SCHOOL  NIGHT 
(HOURS) 

FREQUENCY 

RELATIVE  FRE- 
QUENCY 

CUMULATIVE  RELA- 
TIVE FREQUENCY 

continued  on  next  page 
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4 

2 

0.04 

0.04 

5 

5 

0.10 

0.14 

6 

7 

0.14 

0.28 

7 

12 

0.24 

0.52 

8 

14 

0.28 

0.80 

9 

7 

0.14 

0.94 

10 

3 

0.06 

1.00 

Table  2.5 


Find  the  28th  percentile:  Notice  the  0.28  in  the  "cumulative  relative  frequency"  column.  28%  of  50 
data  values  =  14.  There  are  14  values  less  than  the  28th  %ile.  They  include  the  two  4s,  the  five  5s, 
and  the  seven  6s.  The  28th  %ne  is  between  the  last  6  and  the  first  7.  The  28th  %ile  is  6.5. 

Find  the  median:  Look  again  at  the  "cumulative  relative  frequency  "  column  and  find  0.52.  The 
median  is  the  50th  %ile  or  the  second  quartile.  50%  of  50  =  25.  There  are  25  values  less  than  the 
median.  They  include  the  two  4s,  the  five  5s,  the  seven  6s,  and  eleven  of  the  7s.  The  median  or 
50th  %ile  is  between  the  25th  (7)  and  26th  (7)  values.  The  median  is  7. 

Find  the  third  quartile:  The  third  quartile  is  the  same  as  the  75th  percentile.  You  can  "eyeball"  this 
answer.  If  you  look  at  the  "cumulative  relative  frequency"  column,  you  find  0.52  and  0.80.  When 
you  have  all  the  4s,  5s,  6s  and  7s,  you  have  52%  of  the  data.  When  you  include  all  the  8s,  you  have 
80%  of  the  data.  The  75th  %ile,  then,  must  be  an  8  .  Another  way  to  look  at  the  problem  is  to  find 
75%  of  50  (=  37.5)  and  round  up  to  38.  The  third  quartile,  Q3,  is  the  38th  value  which  is  an  8.  You 
can  check  this  answer  by  coimting  the  values.  (There  are  37  values  below  the  third  quartile  and  12 
values  above.) 

Example  2.13 

Using  the  table: 

1.  Find  the  80th  percentile. 

2.  Find  the  90th  percentile. 

3.  Find  the  first  quartile. 

4.  What  is  another  name  for  the  first  quartile? 


Collaborative  Classroom  Exercise:  Your  instructor  or  a  member  of  the  class  will  ask  everyone  in  class  how 
many  sweaters  they  own.  Answer  the  following  questions. 

1.  How  many  students  were  surveyed? 

2.  What  kind  of  sampling  did  you  do? 

3.  Construct  a  table  of  the  data. 

4.  Construct  2  different  histograms.  For  each,  starting  value  =  ending  value  =  . 

5.  Use  the  table  to  find  the  median,  first  quartile,  and  third  quartile. 

6.  Construct  a  box  plot. 

7.  Use  the  table  to  find  the  following: 

•  The  10th  percentile 

•  The  70th  percentile 

•  The  percent  of  students  who  own  less  than  4  sweaters 
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Interpreting  Percentiles,  Quartiles,  and  Median 

A  percentile  indicates  the  relative  standing  of  a  data  value  when  data  are  sorted  into  numerical  order,  from 
smallest  to  largest.  p%  of  data  values  are  less  than  or  equal  to  the  pth  percentile.  For  example,  15%  of  data 
values  are  less  than  or  equal  to  the  15th  percentile. 

•  Low  percentiles  always  correspond  to  lower  data  values. 

•  High  percentiles  always  correspond  to  higher  data  values. 

A  percentile  may  or  may  not  correspond  to  a  value  judgment  about  whether  it  is  "good"  or  "bad".  The 
interpretation  of  whether  a  certain  percentile  is  good  or  bad  depends  on  the  context  of  the  situation  to 
which  the  data  applies.  In  some  situations,  a  low  percentile  woiild  be  considered  "good';  in  other  contexts 
a  high  percentile  might  be  considered  "good".  In  many  situations,  there  is  no  value  judgment  that  applies. 

Understanding  how  to  properly  interpret  percentiles  is  important  not  only  when  describing  data, 
but  is  also  important  in  later  chapters  of  this  textbook  when  calculating  probabilities. 

Guideline: 

When  writing  the  interpretation  of  a  percentile  in  the  context  of  the  given  data,  the  sentence  should 
contain  the  following  information: 

•  information  about  the  context  of  the  situation  being  considered, 

•  the  data  value  (value  of  the  variable)  that  represents  the  percentile, 

•  the  percent  of  individuals  or  items  with  data  values  below  the  percentile. 

•  Additionally,  you  may  also  choose  to  state  the  percent  of  individuals  or  items  with  data  values  above 
the  percentile. 

Example  2.14 

On  a  timed  math  test,  the  first  quartile  for  times  for  finishing  the  exam  was  35  minutes.  Interpret 
the  first  quartile  in  the  context  of  this  situation. 

•  25%  of  students  finished  the  exam  in  35  minutes  or  less. 

•  75%  of  students  finished  the  exam  in  35  minutes  or  more. 

•  A  low  percentile  could  be  considered  good,  as  finishing  more  quickly  on  a  timed  exam  is 
desirable.  (If  you  take  too  long,  you  might  not  be  able  to  finish.) 

Example  2.15 

On  a  20  question  math  test,  the  70th  percentile  for  number  of  correct  answers  was  16.  Interpret 
the  70th  percentile  in  the  context  of  this  situation. 

•  70%  of  students  answered  16  or  fewer  questions  correctly. 

•  30%  of  students  answered  16  or  more  questions  correctly. 

•  Note:  A  high  percentile  could  be  considered  good,  as  answering  more  questions  correctly  is 
desirable. 

Example  2.16 

At  a  certain  community  college,  it  was  found  that  the  30th  percentile  of  credit  units  that  students 
are  enrolled  for  is  7  units.  Interpret  the  30th  percentile  in  the  context  of  this  situation. 

•  30%  of  students  are  enrolled  in  7  or  fewer  credit  units 

•  70%  of  students  are  enrolled  in  7  or  more  credit  units 

•  In  this  example,  there  is  no  "good"  or  "bad"  value  judgment  associated  with  a  higher  or 
lower  percentile.  Students  attend  community  college  for  varied  reasons  and  needs,  and  their 
course  load  varies  according  to  their  needs. 
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Do  the  following  Practice  Problems  for  Interpreting  Percentiles 

Exercise  2.6.1  (Solution  on  p.  115.) 

a.  For  runners  in  a  race,  a  low  time  means  a  faster  nm.  The  wirmers  in  a  race  have  the  shortest 

running  times.  Is  it  more  desirable  to  have  a  finish  time  with  a  high  or  a  low  percentile  when 
running  a  race? 

b.  The  20th  percentile  of  run  times  in  a  particular  race  is  5.2  minutes.  Write  a  sentence  interpreting 

the  20th  percentile  in  the  context  of  the  situation. 

c.  A  bicyclist  in  the  90th  percentile  of  a  bicycle  race  between  two  towns  completed  the  race  in  1 

hour  and  12  minutes.  Is  he  among  the  fastest  or  slowest  cyclists  in  the  race?  Write  a  sentence 
interpreting  the  90th  percentile  in  the  context  of  the  situation. 

Exercise  2.6.2  (Solution  on  p.  116.) 

a.  For  runners  in  a  race,  a  higher  speed  means  a  faster  run.  Is  it  more  desirable  to  have  a  speed 

with  a  high  or  a  low  percentile  when  running  a  race? 

b.  The  40th  percentile  of  speeds  in  a  particular  race  is  7.5  miles  per  hour.  Write  a  sentence  inter- 

preting the  40th  percentile  in  the  context  of  the  situation. 

Exercise  2.6.3  (Solution  on  p.  116.) 

On  an  exam,  would  it  be  more  desirable  to  earn  a  grade  with  a  high  or  low  percentile?  Explain. 

Exercise  2.6.4  (Solution  on  p.  116.) 

Mina  is  waiting  in  line  at  the  Department  of  Motor  Vehicles  (DMV).  Her  wait  time  of  32  minutes 
is  the  85th  percentile  of  wait  times.  Is  that  good  or  bad?  Write  a  sentence  interpreting  the  85th 
percentile  in  the  context  of  this  situation. 

Exercise  2.6.5  (Solution  on  p.  116.) 

In  a  survey  collecting  data  about  the  salaries  earned  by  recent  college  graduates,  Li  found  that  her 
salary  was  in  the  78th  percentile.  Shoiild  Li  be  pleased  or  upset  by  this  result?  Explain. 

Exercise  2.6.6  (Solution  on  p.  116.) 

In  a  study  collecting  data  about  the  repair  costs  of  damage  to  automobiles  in  a  certain  type  of 

crash  tests,  a  certain  model  of  car  had  $1700  in  damage  and  was  in  the  90th  percentile.  Should  the 
manufacturer  and /or  a  consumer  be  pleased  or  upset  by  this  result?  Explain.  Write  a  sentence 
that  interprets  the  90th  percentile  in  the  context  of  this  problem. 

Exercise  2.6.7  (Solution  on  p.  116.) 

The  University  of  California  has  two  criteria  used  to  set  admission  standards  for  freshman  to  be 
admitted  to  a  college  in  the  UC  system: 

a.  Students'  CPAs  and  scores  on  standardized  tests  (SATs  and  ACTs)  are  entered  into  a  formula 

that  calculates  an  "admissions  index"  score.  The  admissions  index  score  is  used  to  set  eligi- 
bility standards  intended  to  meet  the  goal  of  admitting  the  top  12%  of  high  school  students 
in  the  state.  In  this  context,  what  percentile  does  the  top  12%  represent? 

b.  Students  whose  CPAs  are  at  or  above  the  96th  percentile  of  all  students  at  their  high  school 

are  eligible  (called  eligible  in  the  local  context),  even  if  they  are  not  in  the  top  12%  of  all 
students  in  the  state.  What  percent  of  students  from  each  high  school  are  "eligible  in  the 
local  context"? 

Exercise  2.6.8  (Solution  on  p.  116.) 

Suppose  that  you  are  buying  a  house.  You  and  your  realtor  have  determined  that  the  most  expen- 
sive house  you  can  afford  is  the  34th  percentile.  The  34th  percentile  of  housing  prices  is  $240,000 
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in  the  town  you  want  to  move  to.  In  this  town,  can  you  afford  34%  of  the  houses  or  66%  of  the 
houses? 

**With  contributions  from  Roberta  Bloom 


2.7  Measures  of  the  Center  of  the  Data^° 

The  "center"  of  a  data  set  is  also  a  way  of  describing  location.  The  two  most  widely  used  measures  of  the 
"center"  of  the  data  are  the  mean  (average)  and  the  median.  To  calculate  the  mean  weight  of  50  people, 
add  the  50  weights  together  and  divide  by  50.  To  find  the  median  weight  of  the  50  people,  order  the  data 
and  find  the  number  that  splits  the  data  into  two  equal  parts  (previously  discussed  under  box  plots  in  this 
chapter).  The  median  is  generally  a  better  measure  of  the  center  when  there  are  extreme  values  or  outliers 
because  it  is  not  affected  by  the  precise  numerical  values  of  the  outliers.  The  mean  is  the  most  common 
measure  of  the  center. 

NOTE:  The  words  "mean"  and  "average"  are  often  used  interchangeably.  The  substitution  of  one 
word  for  the  other  is  common  practice.  The  technical  term  is  "arithmetic  mean"  and  "average"  is 
technically  a  center  location.  However,  in  practice  among  non-statisticians,  "average"  is  commonly 
accepted  for  "arithmetic  mean." 

The  mean  can  also  be  calciilated  by  multiplying  each  distinct  value  by  its  frequency  and  then  dividing  the 
svm  by  the  total  number  of  data  values.  The  letter  used  to  represent  the  sample  mean  is  an  3::  with  a  bar 
over  it  (pronounced  "x  bar"):  x. 

The  Greek  letter  (pronounced  "mew")  represents  the  population  mean.  One  of  the  requirements  for  the 
sample  mean  to  be  a  good  estimate  of  the  population  mean  is  for  the  sample  taken  to  be  truly  random. 

To  see  that  both  ways  of  calculating  the  mean  are  the  same,  consider  the  sample: 

1;  1;  1;  2;  2;  3;  4;  4;  4;  4;  4 

_  1+1+1+2+2+3+4+4+4+4+4 

X  =  :^  =  2.7  (2.6) 

_  3x1+2x2+1x3+5x4 

X  =   j:^  =  2.7  (2.7) 

In  the  second  calculation  for  the  sample  mean,  the  frequencies  are  3, 2, 1,  and  5. 
You  can  quickly  find  the  location  of  the  median  by  using  the  expression 

The  letter  n  is  the  total  number  of  data  values  in  the  sample.  If  n  is  an  odd  number,  the  median  is  the  middle 
value  of  the  ordered  data  (ordered  smallest  to  largest).  If  n  is  an  even  number,  the  median  is  equal  to  the 
two  middle  values  added  together  and  divided  by  2  after  the  data  has  been  ordered.  For  example,  if  the 
total  number  of  data  values  is  97,  then  ^^^=  49.  The  median  is  the  49th  value  in  the  ordered  data. 
If  the  total  number  of  data  values  is  100,  then  ^00+^  =  50.5.  The  median  occurs  midway  between  the 
50th  and  51st  values.  The  location  of  the  median  and  the  value  of  the  median  are  not  the  same.  The  upper 
case  letter  M  is  often  used  to  represent  the  median.  The  next  example  illustrates  the  location  of  the  median 
and  the  value  of  the  median. 

Example  2.17 

AIDS  data  indicating  the  number  of  months  an  AIDS  patient  lives  after  taking  a  new  antibody 
drug  are  as  follows  (smallest  to  largest): 


^"^This  content  is  available  online  at  <http://cnx.Org/content/ml7102/l.13/> 
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3;  4;  8;  8;  10;  11;  12;  13;  14;  15;  15;  16;  16;  17;  17;  18;  21;  22;  22;  24;  24;  25;  26;  26;  27;  27;  29;  29;  31;  32; 
33;  33;  34;  34;  35;  37;  40;  44;  44;  47 

Calculate  the  mean  and  the  median. 
Solution 

The  calculation  for  the  mean  is: 

^  ^  [3+4+(8)(2)+10+ll+12+13+14+(15)(2)  +  (16)(2)+...+35+37+40+(44)(2)+47]  ^  23  6 

To  find  the  median,  M,  first  use  the  formula  for  the  location.  The  location  is: 
«+l  =  4m  =  20.5 

Starting  at  the  smallest  value,  the  median  is  located  between  the  20th  and  21st  values  (the  two 
24s): 

3;  4;  8;  8;  10;  11;  12;  13;  14;  15;  15;  16;  16;  17;  17;  18;  21;  22;  22;  24;  24;  25;  26;  26;  27;  27;  29;  29;  31;  32; 
33;  33;  34;  34;  35;  37;  40;  44;  44;  47 

M  =  24+24  ^  24 

The  median  is  24. 

Using  the  TI-83,83+,84, 84+  Calculators 

Calculator  Instructions  are  located  in  the  menu  item  14:Appendix  (Notes  for  the  TI-83,  83+,  84, 
84+  Calculators). 

•  Enter  data  into  the  list  editor.  Press  STAT  1:EDIT 

•  Put  the  data  values  in  list  LI. 

•  Press  STAT  and  arrow  to  CALC.  Press  l:l-VarStats.  Press  2nd  1  for  LI  and  ENTER. 

•  Press  the  down  and  up  arrow  keys  to  scroll. 

X  =  23.6,  M  =  24 
Example  2.18 

Suppose  that,  in  a  small  town  of  50  people,  one  person  earns  $5,000,000  per  year  and  the  other  49 
each  earn  $30,000.  Which  is  the  better  measure  of  the  "center,"  the  mean  or  the  median? 

Solution 

^  _  5000000+49x30000  _  ]^29400 

M  =  30000 

(There  are  49  people  who  earn  $30,000  and  one  person  who  earns  $5,000,000.) 

The  median  is  a  better  measure  of  the  "center"  than  the  mean  because  49  of  the  values  are  30,000 
and  one  is  5,000,000.  The  5,000,000  is  an  outlier.  The  30,000  gives  us  a  better  sense  of  the  middle  of 
the  data. 


Another  measure  of  the  center  is  the  mode.  The  mode  is  the  most  frequent  value.  If  a  data  set  has  two 
values  that  occur  the  same  number  of  times,  then  the  set  is  bimodal. 

Example  2.19:  Statistics  exam  scores  for  20  students  are  as  follows 

Statistics  exam  scores  for  20  students  are  as  follows: 
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50  ;  53  ;  59  ;  59 ;  63 ;  63  ;  72 ;  72 ;  72 ;  72 ;  72 ;  76 ;  78 ;  81 ;  83 ;  84 ;  84 ;  84 ;  90  ;  93 

Problem 

Find  the  mode. 

Solution 

The  most  frequent  score  is  72,  which  occurs  five  times.  Mode  =  72. 


Example  2.20 

Five  real  estate  exam  scores  are  430, 430, 480, 480, 495.  The  data  set  is  bimodal  because  the  scores 
430  and  480  each  occur  twice. 

When  is  the  mode  the  best  measure  of  the  "center"?  Consider  a  weight  loss  program  that  advertises 
a  mean  weight  loss  of  six  pounds  the  first  week  of  the  program.  The  mode  might  indicate  that  most 
people  lose  two  poimds  the  first  week,  making  the  program  less  appealing. 

NOTE:  The  mode  can  be  calculated  for  qualitative  data  as  well  as  for  quantitative  data. 

Statistical  software  will  easily  calculate  the  mean,  the  median,  and  the  mode.  Some  graphing 
calculators  can  also  make  these  calculations.  In  the  real  world,  people  make  these  calciilations 
using  software. 


2.7,1  The  Law  of  Large  Numbers  and  the  Mean 

The  Law  of  Large  Numbers  says  that  if  you  take  samples  of  larger  and  larger  size  from  any  population, 
then  the  mean  x  of  the  sample  is  very  likely  to  get  closer  and  closer  to  }i.  This  is  discussed  in  more  detail  in 
The  Central  Limit  Theorem. 

NOTE:  The  formula  for  the  mean  is  located  in  the  Siraimary  of  Formiilas  (Section  2.10)  section 
course. 


2.7.2  Sampling  Distributions  and  Statistic  of  a  Sampling  Distribution 

You  can  think  of  a  sampling  distribution  as  a  relative  frequency  distribution  with  a  great  many  samples. 
(See  Sampling  and  Data  for  a  review  of  relative  frequency).  Suppose  thirty  randomly  selected  students 
were  asked  the  number  of  movies  they  watched  the  previous  week.  The  results  are  in  the  relative  frequency 
table  shown  below. 


#  of  movies 

Relative  Frequency 

0 

5/30 

1 

15/30 

2 

6/30 

3 

4/30 

4 

1/30 

Table  2.6 
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If  you  let  the  number  of  samples  get  very  large  (say,  300  million  or  more),  the  relative  frequency  table 
becomes  a  relative  frequency  distribution. 

A  statistic  is  a  number  calculated  from  a  sample.  Statistic  examples  include  the  mean,  the  median  and  the 
mode  as  well  as  others.  The  sample  mean  x  is  an  example  of  a  statistic  which  estimates  the  population 
mean  ji. 

2.8  Skewness  and  the  Mean,  Median,  and  Mode" 

Consider  the  following  data  set: 
4;5;6;6;6;7;7;7;7;7;7;8;8;8;9;10 

This  data  set  produces  the  histogram  shown  below.  Each  interval  has  width  one  and  each  value  is  located 
in  the  middle  of  an  interval. 


4      5     6      7      8      9  10 

The  histogram  displays  a  symmetrical  distribution  of  data.  A  distribution  is  symmetrical  if  a  vertical  line 
can  be  drawn  at  some  point  in  the  histogram  such  that  the  shape  to  the  left  and  the  right  of  the  vertical 
line  are  mirror  images  of  each  other.  The  mean,  the  median,  and  the  mode  are  each  7  for  these  data.  In  a 
perfectly  symmetrical  distribution,  the  mean  and  the  median  are  the  same.  This  example  has  one  mode 
(unimodal)  and  the  mode  is  the  same  as  the  mean  and  median.  In  a  symmetrical  distribution  that  has  two 
modes  (bimodal),  the  two  modes  would  be  different  from  the  mean  and  median. 

The  histogram  for  the  data: 

4;5;6;6;6;7;7;7;7;8 

is  not  syrmnetrical.  The  right-hand  side  seems  "chopped  off"  compared  to  the  left  side.  The  shape  distribu- 
tion is  called  skewed  to  the  left  because  it  is  pulled  out  to  the  left. 

^^This  content  is  available  online  at  <http://cnx.org/content/ml71 04/1.9/>. 
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4      5      6     7  8 

The  mean  is  6.3,  the  median  is  6.5,  and  the  mode  is  7.  Notice  that  the  mean  is  less  than  the  median  and 
they  are  both  less  than  the  mode.  The  mean  and  the  median  both  reflect  the  skewing  but  the  mean  more 
so. 


The  histogram  for  the  data: 
6;7;7;7;7;8;8;8;9;10 
is  also  not  symmetrical.  It  is  skewed  to  the  right. 


6     7      8     9  10 


The  mean  is  7.7,  the  median  is  7.5,  and  the  mode  is  7.  Of  the  three  statistics,  the  mean  is  the  largest,  while 
the  mode  is  the  smallest.  Again,  the  mean  reflects  the  skewing  the  most. 

To  summarize,  generally  if  the  distribution  of  data  is  skewed  to  the  left,  the  mean  is  less  than  the  median, 
which  is  often  less  than  the  mode.  If  the  distribution  of  data  is  skewed  to  the  right,  the  mode  is  often  less 
than  the  median,  which  is  less  than  the  mean. 

Skewness  and  symmetry  become  important  when  we  discuss  probability  distributions  in  later  chapters. 
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2.9  Measures  of  the  Spread  of  the  Data^^ 

An  important  characteristic  of  any  set  of  data  is  the  variation  in  the  data.  In  some  data  sets,  the  data  values 
are  concentrated  closely  near  the  mean;  in  other  data  sets,  the  data  values  are  more  widely  spread  out  from 
the  mean.  The  most  common  measure  of  variation,  or  spread,  is  the  standard  deviation. 

The  standard  deviation  is  a  number  that  measures  how  far  data  values  are  from  their  mean. 

The  standard  deviation 

•  provides  a  numerical  measure  of  the  overall  amount  of  variation  in  a  data  set 

•  can  be  used  to  determine  whether  a  particular  data  value  is  close  to  or  far  from  the  mean 

The  standard  deviation  provides  a  measure  of  the  overall  variation  in  a  data  set 

The  standard  deviation  is  always  positive  or  0.  The  standard  deviation  is  small  when  the  data  are  all 
concentrated  close  to  the  mean,  exhibiting  little  variation  or  spread.  The  standard  deviation  is  larger  when 
the  data  values  are  more  spread  out  from  the  mean,  exhibiting  more  variation. 

Suppose  that  we  are  studying  waiting  times  at  the  checkout  line  for  customers  at  supermarket  A  and 
supermarket  B;  the  average  wait  time  at  both  markets  is  5  minutes.  At  market  A,  the  standard  deviation 
for  the  waiting  time  is  2  minutes;  at  market  B  the  standard  deviation  for  the  waiting  time  is  4  minutes. 

Because  market  B  has  a  higher  standard  deviation,  we  know  that  there  is  more  variation  in  the  wait- 
ing times  at  market  B.  Overall,  wait  times  at  market  B  are  more  spread  out  from  the  average;  wait  times  at 
market  A  are  more  concentrated  near  the  average. 

The  standard  deviation  can  be  used  to  determine  whether  a  data  value  is  close  to  or  far  from  the  mean. 

Suppose  that  Rosa  and  Binh  both  shop  at  Market  A.  Rosa  waits  for  7  minutes  and  Binh  waits  for  1  minute 
at  the  checkout  counter.  At  market  A,  the  mean  wait  time  is  5  minutes  and  the  standard  deviation  is  2 
minutes.  The  standard  deviation  can  be  used  to  determine  whether  a  data  value  is  close  to  or  far  from  the 
mean. 

Rosa  waits  for  7  minutes: 

•  7  is  2  minutes  longer  than  the  average  of  5;  2  minutes  is  equal  to  one  standard  deviation. 

•  Rosa's  wait  time  of  7  minutes  is  2  minutes  longer  than  the  average  of  5  minutes. 

•  Rosa's  wait  time  of  7  minutes  is  one  standard  deviation  above  the  average  of  5  minutes. 

Binh  waits  for  1  minute. 

•  1  is  4  minutes  less  than  the  average  of  5;  4  minutes  is  equal  to  two  standard  deviations. 

•  Binh's  wait  time  of  1  minute  is  4  minutes  less  than  the  average  of  5  minutes. 

•  Binh's  wait  time  of  1  minute  is  two  standard  deviations  below  the  average  of  5  minutes. 

•  A  data  value  that  is  two  standard  deviations  from  the  average  is  just  on  the  borderline  for  what  many 
statisticians  would  consider  to  be  far  from  the  average.  Considering  data  to  be  far  from  the  mean  if  it 
is  more  than  2  standard  deviations  away  is  more  of  an  approximate  "rule  of  thumb"  than  a  rigid  rule. 
In  general,  the  shape  of  the  distribution  of  the  data  affects  how  much  of  the  data  is  further  away  than 
2  standard  deviations.  (We  wiU  learn  more  about  this  in  later  chapters.) 

The  number  line  may  help  you  understand  standard  deviation.  If  we  were  to  put  5  and  7  on  a  number  line, 
7  is  to  the  right  of  5.  We  say,  then,  that  7  is  one  standard  deviation  to  the  right  of  5  because 
5  +  (1)  (2)  =  7. 


^^This  content  is  available  online  at  <http://caTx.org/content/ml7103/1.15/>. 
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•  In  general,  a  value  =  mean  +  (#ofSTDEV)  (standard  deviation) 

•  where  #ofSTDEVs  =  the  number  of  standard  deviations 

•  7  is  one  standard  deviation  more  than  the  mean  of  5  because:  7=5+(l)(2) 

•  1  is  two  standard  deviations  less  than  the  mean  of  5  because:  l=5+(— 2)(2) 

The  equation  value  =  mean  +  (#ofSTDEVs)  (standard  deviation)  can  be  expressed  for  a  sample  and  for  a 
population: 

•  sample:  x^x  +  {#ofSTDEV)  (s) 

•  Population:  x^}i  +  {#ofSTDEV)  (cr) 

The  lower  case  letter  s  represents  the  sample  standard  deviation  and  the  Greek  letter  c  (sigma,  lower  case) 
represents  the  population  standard  deviation. 

The  symbol  x  is  the  sample  mean  and  the  Greek  symbol  p  is  the  popiilation  mean. 
Calculating  the  Standard  Deviation 

If  X  is  a  number,  then  the  difference  "x  -  mean"  is  called  its  deviation.  In  a  data  set,  there  are  as  many 
deviations  as  there  are  items  in  the  data  set.  The  deviations  are  used  to  calculate  the  standard  deviation. 
If  the  numbers  belong  to  a  population,  in  symbols  a  deviation  is  x  —  }i  .  For  sample  data,  in  symbols  a 
deviation  is  x—  x  . 

The  procedure  to  calculate  the  standard  deviation  depends  on  whether  the  numbers  are  the  entire  popula- 
tion or  are  data  from  a  sample.  The  calculations  are  similar,  but  not  identical.  Therefore  the  symbol  used 
to  represent  the  standard  deviation  depends  on  whether  it  is  calculated  from  a  population  or  a  sample. 
The  lower  case  letter  s  represents  the  sample  standard  deviation  and  the  Greek  letter  a  (sigma,  lower  case) 
represents  the  population  standard  deviation.  If  the  sample  has  the  same  characteristics  as  the  population, 
then  s  should  be  a  good  estimate  of  c 

To  calculate  the  standard  deviation,  we  need  to  calculate  the  variance  first.  The  variance  is  an  average  of 
the  squares  of  the  deviations  (the  x—  x  values  for  a  sample,  or  the  x  —  fi  values  for  a  population).  The 
symbol  cr^  represents  the  population  variance;  the  population  standard  deviation  cr  is  the  square  root  of 
the  population  variance.  The  sjonbol  represents  the  sample  variance;  the  sample  standard  deviation  s  is 
the  square  root  of  the  sample  variance.  You  can  think  of  the  standard  deviation  as  a  special  average  of  the 
deviations. 

If  the  numbers  come  from  a  census  of  the  entire  population  and  not  a  sample,  when  we  calculate  the  aver- 
age of  the  squared  deviations  to  find  the  variance,  we  divide  by  N,  the  number  of  items  in  the  population. 
If  the  data  are  from  a  sample  rather  than  a  population,  when  we  calculate  the  average  of  the  squared  devi- 
ations, we  divide  by  n-1,  one  less  than  the  niimber  of  items  in  the  sample.  You  can  see  that  in  the  formulas 
below. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


83 


Formulas  for  the  Sample  Standard  Deviation 


z(x-xf  i:f-(x-x 


•  For  the  sample  standard  deviation,  the  denominator  is  n-1,  that  is  the  sample  size  MINUS  1. 
Formulas  for  the  Population  Standard  Deviation 

•  ^  =  V  or  cr  =      ^  V 

•  For  the  population  standard  deviation,  the  denominator  is  N,  the  number  of  items  in  the  population. 

In  these  formulas,  /  represents  the  frequency  with  which  a  value  appears.  For  example,  if  a  value  appears 
once,  /  is  1.  If  a  value  appears  three  times  in  the  data  set  or  population,  /  is  3. 

Sampling  Variability  of  a  Statistic 

The  statistic  of  a  sampling  distribution  was  discussed  in  Descriptive  Statistics:  Measuring  the  Center  of 
the  Data.  How  much  the  statistic  varies  from  one  sample  to  another  is  known  as  the  sampling  variability  of 
a  statistic.  You  t5^ically  measure  the  sampling  variability  of  a  statistic  by  its  standard  error.  The  standard 
error  of  the  mean  is  an  example  of  a  standard  error.  It  is  a  special  standard  deviation  and  is  known  as  the 
standard  deviation  of  the  sampling  distribution  of  the  mean.  You  will  cover  the  standard  error  of  the  mean 
in  The  Central  Limit  Theorem  (not  now).  The  notation  for  the  standard  error  of  the  mean  is  ^  where  a  is 
the  standard  deviation  of  the  population  and  n  is  the  size  of  the  sample. 

note:  In  practice,  USE  A  CALCULATOR  OR  COMPUTER  SOFTWARE  TO  CALCULATE 
THE  STANDARD  DEVIATION.  If  you  are  using  a  TI-83,83+,84+  calculator,  you  need  to  select 
the  appropriate  standard  deviation  dx  or  Sx  from  the  summary  statistics.  We  will  concentrate  on 
using  and  interpreting  the  information  that  the  standard  deviation  gives  us.  However  you  should 
study  the  following  step-by-step  example  to  help  you  understand  how  the  standard  deviation 
measures  variation  from  the  mean. 

Example  2.21 

In  a  fifth  grade  class,  the  teacher  was  interested  in  the  average  age  and  the  sample  standard 
deviation  of  the  ages  of  her  students.  The  following  data  are  the  ages  for  a  SAMPLE  of  n  =  20  fifth 
grade  students.  The  ages  are  rounded  to  the  nearest  half  year: 

9 ;  9.5 ;  9.5 ;  10 ;  10 ;  10 ;  10 ;  10.5 ;  10.5 ;  10.5  ;  10.5 ;  11 ;  11 ;  11 ;  11 ;  11 ;  11 ;  11.5 ;  11.5 ;  11.5 
9  +  9.5  X  2  +  10  X  4  + 10.5  x  4  + 11  x  6  + 11.5  x  3 


20 

The  average  age  is  10.53  years,  rounded  to  2  places. 


10.525  (2.8) 


The  variance  may  be  calculated  by  using  a  table.  Then  the  standard  deviation  is  calculated  by 
taking  the  square  root  of  the  variance.  We  will  explain  the  parts  of  the  table  after  calculating  s. 


Data 

Freq. 

Deviations 

Deviations^ 

(Freq.)(Dei;iahons^) 

X 

/ 

(x  —  x) 

{x  -  xf 

(/)  {^-^f 

9 

1 

9  -  10.525  =  -1.525 

(-1.525)^  =  2.325625 

1  X  2.325625  =  2.325625 

9.5 

2 

9.5  -  10.525  =  -1.025 

(-1.025)^  =  1.050625 

2  x  1.050625  =  2.101250 

10 

4 

10  -  10.525  =  -0.525 

(-0.525)^  =  0.275625 

4  x  .275625  =  1.1025 

10.5 

4 

10.5  -  10.525  =  -0.025 

(-0.025)^  =  0.000625 

4  X  .000625  =  .0025 

11 

6 

11  -  10.525  =  0.475 

(0.475)^  =  0.225625 

6  X  .225625  =  1.35375 

11.5 

3 

11.5  -  10.525  =  0.975 

(0.975)^  =  0.950625 

3  X  .950625  =  2.851875 
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Table  2.7 

The  sample  variance,  s^,  is  equal  to  the  sum  of  the  last  column  (9.7375)  divided  by  the  total  number 
of  data  values  minus  one  (20  - 1): 

g2  ^  |75^  ^  0.5125 

The  sample  standard  deviation  s  is  equal  to  the  square  root  of  the  sample  variance: 
s  =  VO.5125  =  .0715891  Rounded  to  two  decimal  places,  s  =  0.72 

Typically,  you  do  the  calculation  for  the  standard  deviation  on  your  calculator  or  computer.  The 

intermediate  results  are  not  rounded.  This  is  done  for  accuracy. 

Problem  1 

Verify  the  mean  and  standard  deviation  calculated  above  on  your  calculator  or  computer. 
Solution 

Using  the  TI-83,83+,84+  Calculators 

•  Enter  data  into  the  list  editor.  Press  STAT  1:EDIT  If  necessary,  clear  the  lists  by  arrowing  up 
into  the  name.  Press  CLEAR  and  arrow  down. 

•  Put  the  data  values  (9,  9.5, 10,  10.5, 11, 11.5)  into  list  LI  and  the  frequencies  (1,  2,  4,  4,  6,  3) 
into  list  L2.  Use  the  arrow  keys  to  move  aroimd. 

•  Press  STAT  and  arrow  to  CALC.  Press  l:l-VarStats  and  enter  LI  (2nd  1),  L2  (2nd  2).  Do  not 
forget  the  comma.  Press  ENTER. 

•  ^=10.525 

•  Use  Sx  because  this  is  sample  data  (not  a  population):  S3!:=0.715891 


•  For  the  following  problems,  recall  that  value  =  mean  +  (#ofSTDEVs)(standard  deviation) 

•  For  a  sample:  x  =  x  +  (#ofSTDEVs)(s) 

•  For  a  population:  x  =  }i  +  (#ofSTDEVs)(  cr) 

•  For  this  example,  use  x  =  x  +  (#ofSTDEVs)(s)  because  the  data  is  from  a  sample 
Problem  2 

Find  the  value  that  is  1  standard  deviation  above  the  mean.  Find  (x  +  Is). 
Solution 

(x  +  Is)  =  10.53+  (1)  (0.72)  =  11.25 
Problem  3 

Find  the  value  that  is  two  standard  deviations  below  the  mean.  Find  {x  —  2s). 
Solution 

(x  -  2s)  ^  10.53  -  (2)  (0.72)  9.09 
Problem  4 

Find  the  values  that  are  1.5  standard  deviations  from  (below  and  above)  the  mean. 
Solution 

•  {x-  1.5s)  =  10.53  -  (1.5)  (0.72)  =  9.45 
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•  (x  +  1.5s)  =  10.53  +  (1.5)  (0.72)  =  11.61 


Explanation  of  the  standard  deviation  calculation  shown  in  the  table 

The  deviations  show  how  spread  out  the  data  are  about  the  mean.  The  data  value  11.5  is  farther  from  the 
mean  than  is  the  data  value  11.  The  deviations  0.97  and  0.47  indicate  that.  A  positive  deviation  occurs  when 
the  data  value  is  greater  than  the  mean.  A  negative  deviation  occurs  when  the  data  value  is  less  than  the 
mean;  the  deviation  is  -1.525  for  the  data  value  9.  If  you  add  the  deviations,  the  sum  is  always  zero.  (For 
this  example,  there  are  n=20  deviations.)  So  you  caimot  simply  add  the  deviations  to  get  the  spread  of  the 
data.  By  squaring  the  deviations,  you  make  them  positive  numbers,  and  the  sum  will  also  be  positive.  The 
variance,  then,  is  the  average  squared  deviation. 

The  variance  is  a  squared  measure  and  does  not  have  the  same  units  as  the  data.  Taking  the  square  root 
solves  the  problem.  The  standard  deviation  measures  the  spread  in  the  same  imits  as  the  data. 

Notice  that  instead  of  dividing  by  n=20,  the  calculation  divided  by  n-l=20-l=19  because  the  data  is  a  sam- 
ple. For  the  sample  variance,  we  divide  by  the  sample  size  minus  one  (n  —  1).  Why  not  divide  by  n?  The 
answer  has  to  do  with  the  population  variance.  The  sample  variance  is  an  estimate  of  the  population  vari- 
ance. Based  on  the  theoretical  mathematics  that  lies  behind  these  calculations,  dividing  by  (n  —  1)  gives  a 
better  estimate  of  the  population  variance. 

NOTE:  Your  concentration  should  be  on  what  the  standard  deviation  tells  us  about  the  data.  The 
standard  deviation  is  a  number  which  measures  how  far  the  data  are  spread  from  the  mean.  Let  a 
calculator  or  computer  do  the  arithmetic. 

The  standard  deviation,  s  or  a,  is  either  zero  or  larger  than  zero.  When  the  standard  deviation  is  0,  there  is 
no  spread;  that  is,  the  all  the  data  values  are  equal  to  each  other.  The  standard  deviation  is  small  when  the 
data  are  all  concentrated  close  to  the  mean,  and  is  larger  when  the  data  values  show  more  variation  from 
the  mean.  When  the  standard  deviation  is  a  lot  larger  than  zero,  the  data  values  are  very  spread  out  about 
the  mean;  outliers  can  make  s  or  cr  very  large. 

The  standard  deviation,  when  first  presented,  can  seem  unclear.  By  graphing  your  data,  you  can  get  a 
better  "feel"  for  the  deviations  and  the  standard  deviation.  You  will  find  that  in  symmetrical  distributions, 
the  standard  deviation  can  be  very  helpful  but  in  skewed  distributions,  the  standard  deviation  may  not  be 
much  help.  The  reason  is  that  the  two  sides  of  a  skewed  distribution  have  different  spreads.  In  a  skewed 
distribution,  it  is  better  to  look  at  the  first  quartile,  the  median,  the  third  quartile,  the  smallest  value,  and 
the  largest  value.  Because  numbers  can  be  confusing,  always  graph  your  data. 

NOTE:  The  formula  for  the  standard  deviation  is  at  the  end  of  the  chapter. 

Example  2.22 

Use  the  following  data  (first  exam  scores)  from  Susan  Dean's  spring  pre-calciilus  class: 

33;  42;  49;  49;  53;  55;  55;  61;  63;  67;  68;  68;  69;  69;  72;  73;  74;  78;  80;  83;  88;  88;  88;  90;  92;  94;  94;  94;  94; 

96;  100 

a.  Create  a  chart  containing  the  data,  frequencies,  relative  frequencies,  and  cumulative  relative 

frequencies  to  three  decimal  places. 

b.  Calculate  the  following  to  one  decimal  place  using  a  TI-83+  or  TI-84  calciilator: 

i.  The  sample  mean 

ii.  The  sample  standard  deviation 

iii.  The  median 
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iv.  The  first  quartile 
V.  The  third  quartile 
vi.  IQR 

c.  Construct  a  box  plot  and  a  histogram  on  the  same  set  of  axes.  Make  comments  about  the  box 
plot,  the  histogram,  and  the  chart. 

Solution 


a. 


Data 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

OO 

i 

0.03z 

0.03z 

A  O 

4z 

1 

0.03z 

0.064 

AO 

z 

0.06D 

o.izy 

CO 

i 

0.032 

0.161 

DD 

Z 

U.06o 

0.zz6 

61 

1 

U.U3Z 

U.ZOO 

oJ 

-1 
i 

U.U3Z 

u./y 

67 

1 

U.U32 

0.322 

DO 

Z 

U.UoO 

U.3o/ 

Z 

U.UoO 

U.^OZ 

71 

1 

0.032 

0.484 

73 

1 

0.032 

0.516 

74 

1 

0.032 

0.548 

78 

1 

0.032 

0.580 

80 

1 

0.032 

0.612 

83 

1 

0.032 

0.644 

88 

3 

0.097 

0.741 

90 

1 

0.032 

0.773 

92 

1 

0.032 

0.805 

94 

4 

0.129 

0.934 

96 

1 

0.032 

0.966 

100 

1 

0.032 

0.998  (Why  isn't  this  value  1?) 

Table  2.8 


b.  i.  The  sample  mean  =  73.5 

ii.  The  sample  standard  deviation  =  17.9 

iii.  The  median  =  73 

iv.  The  first  quartile  =  61 
V.  The  third  quartile  =  90 
vi.  IQR  =  90  -  61  =  29 

c.  The  X-axis  goes  from  32.5  to  100.5;  y-axis  goes  from  -2.4  to  15  for  the  histogram;  number  of 

intervals  is  5  for  the  histogram  so  the  width  of  an  interval  is  (100.5  -  32.5)  divided  by  5  which 
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T 

I 


32.5  ^6.1  5S.7  73.3  86.9  100.5 


Figure  2.1 


The  long  left  whisker  in  the  box  plot  is  reflected  in  the  left  side  of  the  histogram.  The  spread  of 
the  exam  scores  in  the  lower  50%  is  greater  (73  -  33  =  40)  than  the  spread  in  the  upper  50%  (100  - 
73  =  27).  The  histogram,  box  plot,  and  chart  all  reflect  this.  There  are  a  substantial  number  of  A 
and  B  grades  (80s,  90s,  and  100).  The  histogram  clearly  shows  this.  The  box  plot  shows  us  that  the 
middle  50%  of  the  exam  scores  (IQR  =  29)  are  Ds,  Cs,  and  Bs.  The  box  plot  also  shows  us  that  the 
lower  25%  of  the  exam  scores  are  Ds  and  Fs. 

Comparing  Values  from  Different  Data  Sets 

The  standard  deviation  is  useful  when  comparing  data  values  that  come  from  different  data  sets.  If  the  data 
sets  have  different  means  and  standard  deviations,  it  can  be  misleading  to  compare  the  data  values  directly. 

•  For  each  data  value,  calculate  how  many  standard  deviations  the  value  is  away  from  its  mean. 

•  Use  the  formula:  value  =  mean  +  (#ofSTDEVs) (standard  deviation);  solve  for  #ofSTDEVs. 

•  #0fSTDEVs  =  staMard  deviation 

•  Compare  the  results  of  this  calculation. 

#ofSTDEVs  is  often  called  a  "z-score";  we  can  use  the  symbol  z.  In  symbols,  the  formulas  become: 


Sample 

X  =  X  +  zs 

^  S 

Population 

X  =  }i  +  za 

x—u 

2  =  ^ 
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Table  2.9 

Example  2.23 

Two  students,  John  and  Ali,  from  different  high  schools,  wanted  to  find  out  who  had  the  highest 
G.P.A.  when  compared  to  his  school.  Which  student  had  the  highest  G.RA.  when  compared  to  his 
school? 


Student 

GPA 

School  Mean  GPA 

School  Standard  Deviation 

John 

2.85 

3.0 

0.7 

AU 

77 

80 

10 

Table  2.10 

Solution 

For  each  student,  determine  how  many  standard  deviations  (#ofSTDEVs)  his  GPA  is  away  from 
the  average,  for  his  school.  Pay  careful  attention  to  signs  when  comparing  and  interpreting  the 
answer. 


Unf'iTnFVe  —  value-mean  .  _  x-p 
wujsi  ucvb  —  standard  deviation      ~  a 

For  John,  z  =  #ofSTDEVs  =  ^^^f^  = 


-0.21 


For  Ali,  z  =  #ofSTDEVs  = 


-0.3 


John  has  the  better  G.P.A.  when  compared  to  his  school  because  his  G.PA.  is  0.21  standard 
deviations  below  his  school's  mean  while  All's  G.RA.  is  0.3  standard  deviations  below  his 
school's  mean. 


John's  z-score  of  —0.21  is  higher  than  All's  z-score  of  —0.3  .  For  GPA,  higher  values  are 
better,  so  we  conclude  that  John  has  the  better  GPA  when  compared  to  his  school. 


The  following  lists  give  a  few  facts  that  provide  a  little  more  insight  into  what  the  standard  deviation  tells 
us  about  the  distribution  of  the  data. 

For  ANY  data  set,  no  matter  what  the  distribution  of  the  data  is: 

•  At  least  75%  of  the  data  is  within  2  standard  deviations  of  the  mean. 

•  At  least  89%  of  the  data  is  within  3  standard  deviations  of  the  mean. 

•  At  least  95%  of  the  data  is  within  41/2  standard  deviations  of  the  mean. 

•  This  is  known  as  Chebyshev's  Rule. 

For  data  having  a  distribution  that  is  MOUND-SHAPED  and  SYMMETRIC: 

•  Approximately  68%  of  the  data  is  within  1  standard  deviation  of  the  mean. 

•  Approximately  95%  of  the  data  is  within  2  standard  deviations  of  the  mean. 

•  More  than  99%  of  the  data  is  within  3  standard  deviations  of  the  mean. 

•  This  is  known  as  the  Empirical  Rule. 

•  It  is  important  to  note  that  this  rule  only  applies  when  the  shape  of  the  distribution  of  the  data  is 
moimd-shaped  and  sjTiimetric.  We  will  learn  more  about  this  when  studying  the  "Normal"  or  "Gaus- 
sian" probability  distribution  in  later  chapters. 

**With  contributions  from  Roberta  Bloom 
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2.10  Summary  of  Formulas^^ 

Commonly  Used  Symbols 

•  The  symbol  E  means  to  add  or  to  find  the  sum. 

•  n  =  the  number  of  data  values  in  a  sample 

•  N  =  the  number  of  people,  things,  etc.  in  the  population 

•  X  =  the  sample  mean 

•  s  =  the  sample  standard  deviation 

•  fi  =  the  population  mean 

•  (T  =  the  population  standard  deviation 

•  /  =  frequency 

•  X  =  numerical  value 

Commonly  Used  Expressions 

•  X*  f  =  A  value  multiplied  by  its  respective  frequency 

•  J^x  =  The  sum  of  the  values 

•  E  ^  *  /  =  The  sum  of  values  multiplied  by  their  respective  frequencies 

•  {x  —  x)  or  {x  —  li)  =  Deviations  from  the  mean  (how  far  a  value  is  from  the  mean) 

•  [x  —  x)'^  or  (x  —  p)'^  =  Deviations  squared 

•  f  {x  —  x)^  or  /  (x  —  f<)^  =  The  deviations  squared  and  multiplied  by  their  frequencies 
Mean  Formulas: 

•  x=^forx=^^ 

Standard  Deviation  Formulas: 


lUx-uf  /T.f-(x-Jif 

•  (T^y  ^f"'  or  (7-  ^  y  ^  ^f"' 

Formulas  Relating  a  Value,  the  Mean,  and  the  Standard  Deviation: 

•  value  =  mean  +  (#ofSTDEVs)(standard  deviation),  where  #ofSTDEVs  =  the  number  of  standard  devi- 
ations 

•  X  =  x+ (#ofSTDEVs)(s) 

•  X  = /i  +  (#ofSTDEVs)((r) 


^^This  content  is  available  online  at  <http:/ / cnx.org/content/ ml6310/1.9/ >. 
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2.11  Practice  1:  Center  of  the  Data" 

2.11.1  Student  Learning  Outcomes 

•  The  student  will  calculate  and  interpret  the  center,  spread,  and  location  of  the  data. 

•  The  student  will  construct  and  interpret  histograms  an  box  plots. 


2.11.2  Given 

Sixty-five  randomly  selected  car  salespersons  were  asked  the  number  of  cars  they  generally  sell  in  one 
week.  Fourteen  people  answered  that  they  generally  sell  three  cars;  nineteen  generally  sell  four  cars;  twelve 
generally  sell  five  cars;  nine  generally  sell  six  cars;  eleven  generally  sell  seven  cars. 

2.11.3  Complete  the  Table 


Data  Value  (#  cars) 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

Table  2.11 


2.11.4  Discussion  Questions 

Exercise  2.11.1  (Solution  on  p.  116.) 

What  does  the  frequency  column  sum  to?  Why? 

Exercise  2.11.2  (Solution  on  p.  116.) 

What  does  the  relative  frequency  column  sum  to?  Why? 

Exercise  2.11.3 

What  is  the  difference  between  relative  frequency  and  frequency  for  each  data  value? 
Exercise  2.11.4 

What  is  the  difference  between  ciraiulative  relative  frequency  and  relative  frequency  for  each  data 
value? 


2.11.5  Enter  the  Data 

Enter  youj  data  into  your  calculator  or  computer. 

^*This  content  is  available  online  at  <http://cnx.org/content/ml6312/1.12/>. 
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2.11.6  Construct  a  Histogram 

Determine  appropriate  minimum  and  maximum  x  and  y  values  and  the  scaling.  Sketch  the  histogram 
below.  Label  the  horizontal  and  vertical  axes  with  words.  Include  numerical  scaling. 


2.11.7  Data  Statistics 

Calculate  the  following  values: 

Exercise  2.11.5  (Solution  on  p.  116.) 

Sample  mean  =  x  = 

Exercise  2.11.6  (Solution  on  p.  116.) 

Sample  standard  deviation  =  Sx  = 

Exercise  2.11.7  (Solution  on  p.  116.) 

Sample  size  -  n  = 


2.11.8  Calculations 


Use  the  table  in  section  2.11.3  to  calculate  the  following  values: 

Exercise  2.11.8 
Median 

Exercise  2.11.9 

Mode  = 

Exercise  2.11.10 
First  quartile  = 

Exercise  2.11.11 

Second  quartile  =  median  =  50th  percentile  = 

Exercise  2.11.12 
Third  quartile  = 

Exercise  2.11.13 

Interquartile  range  (IQR)  =  -  =  

Exercise  2.11.14 

10th  percentile  = 

Exercise  2.11.15 

70th  percentile  = 


(Solution  on  p.  116.) 


(Solution  on  p.  116.) 


(Solution  on  p.  116.) 


(Solution  on  p.  117.) 


(Solution  on  p.  117.) 


(Solution  on  p.  117.) 


(Solution  on  p.  117.) 


(Solution  on  p.  117.) 
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Exercise  2.11.16  (Solution  on  p.  117.) 

Find  the  value  that  is  3  standard  deviations: 

a.  Above  the  mean 

b.  Below  the  mean 


2.11.9  Box  Plot 

Construct  a  box  plot  below.  Use  a  ruler  to  measure  and  scale  acciirately. 

2.11.10  Interpretation 

Looking  at  your  box  plot,  does  it  appear  that  the  data  are  concentrated  together,  spread  out  evenly,  or 
concentrated  in  some  areas,  but  not  in  others?  How  can  you  tell? 
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2.12  Practice  2:  Spread  of  the  Data ' 

2.12.1  Student  Learning  Outcomes 

•  The  student  will  calculate  measures  of  the  center  of  the  data. 

•  The  student  will  calculate  the  spread  of  the  data. 


2.12.2  Given 

The  population  parameters  below  describe  the  full-time  equivalent  number  of  students  (PTES)  each  year 
at  Lake  Tahoe  Commimity  College  from  1976-77  through  2004-2005.  {Source:  Graphically  Speaking  by  Bill 
King,  LTCC  Institutional  Research,  December  2005). 

Use  these  values  to  answer  the  following  questions: 

•  =  1000  PTES 

•  Median  =  1014  PTES 

•  (r  =  474  PTES 

•  Pirst  quartile  =  528.5  PTES 

•  Third  quartile  =  1447.5  PTES 

•  n  =  29  years 


2.12.3  Calculate  the  Values 


Exercise  2.12.1  (Solution  on  p.  117.) 

A  sample  of  11  years  is  taken.  About  how  many  are  expected  to  have  a  PTES  of  1014  or  above? 
Explain  how  you  determined  your  answer. 

Exercise  2.12.2  (Solution  on  p.  117.) 

75%  of  all  years  have  a  PTES: 

a.  At  or  below: 

b.  At  or  above: 


Exercise  2.12.3 

The  population  standard  deviation  = 
Exercise  2.12.4 

What  percent  of  the  PTES  were  from  528.5  to  1447.5?  How  do  you  know? 
Exercise  2.12.5 

What  is  the  IQR?  What  does  the  IQR  represent? 
Exercise  2.12.6 

How  many  standard  deviations  away  from  the  mean  is  the  median? 

Additional  Information:  The  population  PTES  for  2005-2006  through  2010-2011  was  given  in  an  updated 
report.  (Source:  http://www.ltcc.edu/data/ResourcePDP/LTCC_PactBook_2010-ll.pdf).  The  data  are  re- 
ported here. 


(Solution  on  p.  117.) 
(Solution  on  p.  117.) 
(Solution  on  p.  117.) 
(Solution  on  p.  117.) 


Year 

2005-06 

2006-07 

2007-08 

2008-09 

2009-10 

2010-11 

Total  PTES 

1585 

1690 

1735 

1935 

2021 

1890 

^^This  content  is  available  online  at  <http:/ /cnx.org/ content/ ml7105/1.12/>. 
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Table  2.12 

Exercise  2.12.7  (Solution  on  p.  117.) 

Calculate  the  mean,  median,  standard  deviation,  first  quartile,  the  third  quartile  and  the  IQR. 
Roimd  to  one  decimal  place. 

Exercise  2.12.8 

Construct  a  boxplot  for  the  PTES  for  2005-2006  through  2010-2011  and  a  boxplot  for  the  PTES  for 
1976-1977  through  2004-2005. 

Exercise  2.12.9  (Solution  on  p.  117.) 

Compare  the  IQR  for  the  PTES  for  1976-77  through  2004-2005  with  the  IQR  for  the  PTES  for  2005- 
2006  through  2010-2011.  Why  do  you  suppose  the  IQRs  are  so  different? 
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2.13  Homework  ' 

Exercise  2.13.1  (Solution  on  p.  117.) 

Twenty-five  randomly  selected  students  were  asked  the  number  of  movies  they  watched  the  pre- 
vious week.  The  results  are  as  follows: 


#  of  movies 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

0 

5 

1 

9 

2 

6 

3 

4 

4 

1 

Table  2.13 


a.  Find  the  sample  mean  x 

b.  Find  the  sample  standard  deviation,  s 

c.  Construct  a  histogram  of  the  data. 

d.  Complete  the  columns  of  the  chart. 

e.  Find  the  first  quartile. 

f .  Find  the  median. 

g.  Find  the  third  quartile. 

h.  Construct  a  box  plot  of  the  data. 

i.  What  percent  of  the  students  saw  fewer  than  three  movies? 
j.  Find  the  40th  percentile. 

k.  Find  the  90th  percentile. 

I.  Construct  a  line  graph  of  the  data. 

m.  Construct  a  stem  plot  of  the  data. 

Exercise  2.13.2 

The  median  age  for  U.S.  blacks  currently  is  30.9  years;  for  U.S.  whites  it  is  42.3 
years.  {{Source:  http://www.usatoday.com/news/nation/story/2012-05A7/minonty-births- 
census/55029100/1)) 

a.  Based  upon  this  information,  give  two  reasons  why  the  black  median  age  coiild  be  lower  than 

the  white  median  age. 

b.  Does  the  lower  median  age  for  blacks  necessarily  mean  that  blacks  die  younger  than  whites? 

Why  or  why  not? 

c.  How  might  it  be  possible  for  blacks  and  whites  to  die  at  approximately  the  same  age,  but  for 

the  median  age  for  whites  to  be  higher? 

Exercise  2.13.3  (Solution  on  p.  118.) 

Forty  randomly  selected  students  were  asked  the  niimber  of  pairs  of  sneakers  they  owned.  Let  X 
=  the  number  of  pairs  of  sneakers  owned.  The  results  are  as  foUows: 


^*This  content  is  available  online  at  <http://cnx.org/content/ml6801 /1.25/>. 
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X 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

1 

2 

2 

5 

3 

8 

4 

12 

5 

12 

7 

1 

Table  2.14 


a.  Find  the  sample  mean  x 

b.  Find  the  sample  standard  deviation,  s 

c.  Construct  a  histogram  of  the  data. 

d.  Complete  the  columns  of  the  chart. 

e.  Find  the  first  quartile. 

f .  Find  the  median. 

g.  Find  the  third  quartile. 

h.  Construct  a  box  plot  of  the  data. 

i.  What  percent  of  the  students  owned  at  least  five  pairs? 
j.  Find  the  40th  percentile. 

k.  Find  the  90th  percentile. 

1.  Construct  a  line  graph  of  the  data 

m.  Construct  a  stem  plot  of  the  data 

Exercise  2.13.4 

600  adult  Americans  were  asked  by  telephone  poll.  What  do  you  think  constitutes  a  middle-class 
income?  The  results  are  below.  Also,  include  left  endpoint,  but  not  the  right  endpoint.  {Source: 
Time  magazine;  survey  by  Yankelovich  Partners,  Inc.) 

NOTE:  "Not  sure"  answers  were  omitted  from  the  results. 


Salary  ($) 

Relative  Frequency 

<  20,000 

0.02 

20,000  -  25,000 

0.09 

25,000  -  30,000 

0.19 

30,000  -  40,000 

0.26 

40,000  -  50,000 

0.18 

50,000  -  75,000 

0.17 

75,000  -  99,999 

0.02 

100,000+ 

0.01 

Table  2.15 


a.  What  percent  of  the  survey  answered  "not  sure"  ? 
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b.  What  percent  think  that  middle-class  is  from  $25,000  -  $50,000  ? 

c.  Construct  a  histogram  of  the  data 

a.  Should  all  bars  have  the  same  width,  based  on  the  data?  Why  or  why  not? 

b.  How  should  the  <20,000  and  the  100,000+  intervals  be  handled?  Why? 

d.  Find  the  40th  and  80th  percentiles 

e.  Construct  a  bar  graph  of  the  data 

Exercise  2.13.5  (Solution  on  p.  118.) 

Following  are  the  published  weights  (in  pounds)  of  all  of  the  team  members  of  the  San  Francisco 
49ers  from  a  previous  year  {Source:  San  Jose  Mercury  News) 

177;  205;  210;  210;  232;  205;  185;  185;  178;  210;  206;  212;  184;  174;  185;  242;  188;  212;  215;  247;  241; 
223;  220;  260;  245;  259;  278;  270;  280;  295;  275;  285;  290;  272;  273;  280;  285;  286;  200;  215;  185;  230; 
250;  241;  190;  260;  250;  302;  265;  290;  276;  228;  265 

a.  Organize  the  data  from  smallest  to  largest  value. 

b.  Find  the  median. 

c.  Find  the  first  quartile. 

d.  Find  the  third  quartile. 

e.  Construct  a  box  plot  of  the  data. 

f .  The  middle  50%  of  the  weights  are  from  to  . 

g.  If  our  population  were  all  professional  football  players,  would  the  above  data  be  a  sample  of 

weights  or  the  population  of  weights?  Why? 

h.  If  our  population  were  the  San  Francisco  49ers,  would  the  above  data  be  a  sample  of  weights 

or  the  population  of  weights?  Why? 

i.  Assume  the  population  was  the  San  Francisco  49ers.  Find: 

i.  the  population  mean,  ^. 

ii.  the  population  standard  deviation,  a. 

iii.  the  weight  that  is  2  standard  deviations  below  the  mean.. 

iv.  When  Steve  Yoimg,  quarterback,  played  football,  he  weighed  205  pounds.  How  many 

standard  deviations  above  or  below  the  mean  was  he? 

j.  That  same  year,  the  mean  weight  for  the  Dallas  Cowboys  was  240.08  pounds  with  a  standard 
deviation  of  44.38  pounds.  Emmit  Smith  weighed  in  at  209  pounds.  With  respect  to  his  team, 
who  was  lighter.  Smith  or  Young?  How  did  you  determine  your  answer? 

Exercise  2.13.6 

An  elementary  school  class  ran  1  mile  with  a  mean  of  11  minutes  and  a  standard  deviation  of  3 
minutes.  Rachel,  a  student  in  the  class,  ran  1  mile  in  8  minutes.  A  junior  high  school  class  ran  1 
mile  with  a  mean  of  9  minutes  and  a  standard  deviation  of  2  minutes.  Kenji,  a  student  in  the  class, 
ran  1  mile  in  8.5  minutes.  A  high  school  class  ran  1  mile  with  a  mean  of  7  minutes  and  a  standard 
deviation  of  4  minutes.  Nedda,  a  student  in  the  class,  ran  1  mile  in  8  minutes. 

a.  Why  is  Kenji  considered  a  better  runner  than  Nedda,  even  though  Nedda  ran  faster  than  he? 

b.  Who  is  the  fastest  runner  with  respect  to  his  or  her  class?  Explain  why. 

Exercise  2.13.7 

In  a  survey  of  20  year  olds  in  China,  Germany  and  America,  people  were  asked  the  number  of 
foreign  countries  they  had  visited  in  their  lifetime.  The  following  box  plots  display  the  results. 
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China 


Geimany 


America 


10 


11 


a.  In  complete  sentences,  describe  what  the  shape  of  each  box  plot  implies  about  the  distribution 

of  the  data  collected. 

b.  Explain  how  it  is  possible  that  more  Americans  than  Germans  surveyed  have  been  to  over  eight 

foreign  countries. 

c.  Compare  the  three  box  plots.  What  do  they  imply  about  the  foreign  travel  of  twenty  year  old 

residents  of  the  three  countries  when  compared  to  each  other? 

Exercise  2.13.8 

One  hundred  teachers  attended  a  seminar  on  mathematical  problem  solving.  The  attitudes  of 
a  representative  sample  of  12  of  the  teachers  were  measured  before  and  after  the  seminar.  A 
positive  number  for  change  in  attitude  indicates  that  a  teacher's  attitude  toward  math  became 
more  positive.  The  twelve  change  scores  are  as  follows: 

3;  8;  -1;2;  0;  5;  -3;  1;-1;6;  5; -2 

a.  What  is  the  mean  change  score? 

b.  What  is  the  standard  deviation  for  this  population? 

c.  What  is  the  median  change  score? 

d.  Find  the  change  score  that  is  2.2  standard  deviations  below  the  mean. 

Exercise  2.13.9  (Solution  on  p.  118.) 

Three  students  were  applying  to  the  same  graduate  school.  They  came  from  schools  with  different 
grading  systems.  Which  student  had  the  best  G.P.A.  when  compared  to  his  school?  Explain  how 
you  determined  your  answer 


Student 

G.P.A. 

School  Ave.  G.P.A. 

School  Standard  Deviation 

Thuy 

2.7 

3.2 

0.8 

Vichet 

87 

75 

20 

Kamala 

8.6 

8 

0.4 

Table  2.16 
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Exercise  2.13.10 

Given  the  following  box  plot: 


0  2  1  0         1  2  13 

a.  Which  quarter  has  the  smallest  spread  of  data?  What  is  that  spread? 

b.  Which  quarter  has  the  largest  spread  of  data?  What  is  that  spread? 

c.  Find  the  Inter  Quartile  Range  (IQR). 

d.  Are  there  more  data  in  the  interval  5  - 10  or  in  the  interval  10  - 13?  How  do  you  know  this? 

e.  Which  interval  has  the  fewest  data  in  it?  How  do  you  know  this? 

I.  0-2 

II.  2-4 

III.  10-12 

IV.  12-13 

Exercise  2.13.11 

Given  the  following  box  plot: 


0    20  100  150 

a.  Think  of  an  example  (in  words)  where  the  data  might  fit  into  the  above  box  plot.  In  2-5  sen- 

tences, write  down  the  example. 

b.  What  does  it  mean  to  have  the  first  and  second  quartiles  so  close  together,  while  the  second  to 

fourth  quartiles  are  far  apart? 

Exercise  2.13.12 

Santa  Clara  County,  CA,  has  approximately  27,873  Japanese-Americans.  Their  ages  are  as  follows. 
{Source:  West  magazine) 


Age  Group 

Percent  of  Community 

0-17 

18.9 

18-24 

8.0 

25-34 

22.8 

35-44 

15.0 

45-54 

13.1 

55-64 

11.9 

65+ 

10.3 

Table  2.17 


a.  Construct  a  histogram  of  the  Japanese-American  community  in  Santa  Clara  County,  CA.  The 
bars  will  not  be  the  same  width  for  this  example.  Why  not? 
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b.  What  percent  of  the  community  is  under  age  35? 

c.  Which  box  plot  most  resembles  the  information  above? 

i. 


0  24        34  53  *100 

ii. 


0  18  34        45  ^100 

iii. 


0  24  25  54  aslOO 

Exercise  2.13.13 

Suppose  that  three  book  publishers  were  interested  in  the  number  of  fiction  paperbacks  adult 
consumers  purchase  per  month.  Each  publisher  conducted  a  survey.  In  the  survey,  each  asked 
adult  consumers  the  number  of  fiction  paperbacks  they  had  purchased  the  previous  month.  The 
results  are  below. 

Publisher  A 


#  of  books 

Freq. 

Rel.  Freq. 

0 

10 

1 

12 

2 

16 

3 

12 

4 

8 

5 

6 

6 

2 

8 

2 

Table  2.18 
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Publisher  B 


#  of  books 

Freq. 

Rel.  Freq. 

0 

18 

1 

24 

2 

24 

3 

22 

4 

15 

5 

10 

7 

5 

9 

1 

Table  2.19 
Publisher  C 


#  of  books 

Freq. 

Rel.  Freq. 

0-1 

20 

2-3 

35 

4-5 

12 

6-7 

2 

8-9 

1 

Table  2.20 


a.  Find  the  relative  frequencies  for  each  survey.  Write  them  in  the  charts. 

b.  Using  either  a  graphing  calculator,  computer,  or  by  hand,  use  the  frequency  column  to  construct 

a  histogram  for  each  publisher's  survey.  For  Publishers  A  and  B,  make  bar  widths  of  1.  For 
Publisher  C,  make  bar  widths  of  2. 

c.  In  complete  sentences,  give  two  reasons  why  the  graphs  for  Publishers  A  and  B  are  not  identical. 

d.  Would  you  have  expected  the  graph  for  Publisher  C  to  look  like  the  other  two  graphs?  Why  or 

why  not? 

e.  Make  new  histograms  for  Publisher  A  and  Publisher  B.  This  time,  make  bar  widths  of  2. 

f.  Now,  compare  the  graph  for  Publisher  C  to  the  new  graphs  for  Publishers  A  and  B.  Are  the 

graphs  more  similar  or  more  different?  Explain  your  answer. 

Exercise  2.13.14 

Often,  cruise  ships  conduct  all  on-board  transactions,  with  the  exception  of  gambling,  on  a  cash- 
less basis.  At  the  end  of  the  cruise,  guests  pay  one  bill  that  covers  all  on-board  transactions.  Sup- 
pose that  60  single  travelers  and  70  couples  were  surveyed  as  to  their  on-board  bills  for  a  seven-day 
cruise  from  Los  Angeles  to  the  Mexican  Riviera.  Below  is  a  summary  of  the  bills  for  each  group. 
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Singles 


Amount($) 

Frequency 

Rel.  Frequency 

51-100 

5 

101-150 

10 

151-200 

15 

201-250 

15 

251-300 

10 

301-350 

5 

Table  2.21 
Couples 

Amount($) 

Frequency 

Rel.  Frequency 

100-150 

5 

201-250 

5 

251-300 

5 

301-350 

5 

351-400 

10 

401-450 

10 

451-500 

10 

501-550 

10 

551-600 

5 

601-650 

5 

Table  2.22 


a.  Fill  in  the  relative  frequency  for  each  group. 

b.  Construct  a  histogram  for  the  Singles  group.  Scale  the  x-axis  by  $50.  widths.  Use  relative 

frequency  on  the  y-axis. 

c.  Construct  a  histogram  for  the  Couples  group.  Scale  the  x-axis  by  $50.  Use  relative  frequency  on 

the  y-axis. 

d.  Compare  the  two  graphs: 

i.  List  two  similarities  between  the  graphs. 

ii.  List  two  differences  between  the  graphs. 

iii.  Overall,  are  the  graphs  more  similar  or  different? 

e.  Construct  a  new  graph  for  the  Couples  by  hand.  Since  each  couple  is  paying  for  two  indi- 

viduals, instead  of  scaling  the  x-axis  by  $50,  scale  it  by  $100.  Use  relative  frequency  on  the 
y-axis. 

f .  Compare  the  graph  for  the  Singles  with  the  new  graph  for  the  Couples: 

i.  List  two  similarities  between  the  graphs. 

ii.  Overall,  are  the  graphs  more  similar  or  different? 
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i.  By  scaling  the  Couples  graph  differently,  how  did  it  change  the  way  you  compared  it  to  the 
Singles? 

j.  Based  on  the  graphs,  do  you  think  that  individuals  spend  the  same  amount,  more  or  less,  as 
singles  as  they  do  person  by  person  in  a  couple?  Explain  why  in  one  or  two  complete  sen- 
tences. 

Exercise  2.13.15  (Solution  on  p.  118.) 

Refer  to  the  following  histograms  and  box  plot.  Determine  which  of  the  following  are  true  and 
which  are  false.  Explain  your  solution  to  each  part  in  complete  sentences. 


a. 

1         2        3         4  5 

b. 

1         2        3         4  5 

c 


0  13  6 

a.  The  medians  for  all  three  graphs  are  the  same. 

b.  We  cannot  determine  if  any  of  the  means  for  the  three  graphs  is  different. 

c.  The  standard  deviation  for  (b)  is  larger  than  the  standard  deviation  for  (a). 

d.  We  cannot  determine  if  any  of  the  third  quartiles  for  the  three  graphs  is  different. 

Exercise  2.13.16 

Refer  to  the  following  box  plots. 
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Datal 


0 


2 


4 


7 


Data  2 


0 


2 


7 


a.  In  complete  sentences,  explain  why  each  statement  is  false. 

i.  Data  1  has  more  data  values  above  2  than  Data  2  has  above  2. 

ii.  The  data  sets  cannot  have  the  same  mode. 

iii.  For  Data  1,  there  are  more  data  values  below  4  than  there  are  above  4. 

b.  For  which  group.  Data  1  or  Data  2,  is  the  value  of  "7"  more  likely  to  be  an  outlier?  Explain  why 

in  complete  sentences 

Exercise  2.13.17  (Solution  on  p.  119.) 

In  a  recent  issue  of  the  IEEE  Spectrum,  84  engineering  conferences  were  announced.  Four  con- 
ferences lasted  two  days.  Thirty-six  lasted  three  days.  Eighteen  lasted  four  days.  Nineteen  lasted 
five  days.  Four  lasted  six  days.  One  lasted  seven  days.  One  lasted  eight  days.  One  lasted  nine 
days.  Let  X  =  the  length  (in  days)  of  an  engineering  conference. 

a.  Organize  the  data  in  a  chart. 

b.  Find  the  median,  the  first  quartile,  and  the  third  quartile. 

c.  Find  the  65th  percentile. 

d.  Find  the  10th  percentile. 

e.  Construct  a  box  plot  of  the  data. 

f.  The  middle  50%  of  the  conferences  last  from  days  to  days. 

g.  Calculate  the  sample  mean  of  days  of  engineering  conferences. 

h.  Calculate  the  sample  standard  deviation  of  days  of  engineering  conferences. 

i.  Find  the  mode. 

j.  If  you  were  planning  an  engineering  conference,  which  would  you  choose  as  the  length  of  the 
conference:  mean;  median;  or  mode?  Explain  why  you  made  that  choice. 

k.  Give  two  reasons  why  you  think  that  3-5  days  seem  to  be  popular  lengths  of  engineering 
conferences. 

Exercise  2.13.18 

A  survey  of  enrollment  at  35  community  colleges  across  the  United  States  yielded  the  following 
figures  {source:  Microsoft  Bookshelf): 

6414;  1550;  2109;  9350;  21828;  4300;  5944;  5722;  2825;  2044;  5481;  5200;  5853;  2750;  10012;  6357; 
27000;  9414;  7681;  3200;  17500;  9200;  7380;  18314;  6557;  13713;  17768;  7493;  2771;  2861;  1263;  7285; 
28165;  5080;  11622 

a.  Organize  the  data  into  a  chart  with  five  intervals  of  equal  width.  Label  the  two  columns  "En- 

rollment" and  "Frequency." 

b.  Construct  a  histogram  of  the  data. 
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c.  If  you  were  to  build  a  new  community  college,  which  piece  of  information  would  be  more 

valuable:  the  mode  or  the  mean? 

d.  Calculate  the  sample  mean. 

e.  Calculate  the  sample  standard  deviation. 

f.  A  school  with  an  enrollment  of  8000  would  be  how  many  standard  deviations  away  from  the 

mean? 

Exercise  2.13.19  (Solution  on  p.  119.) 

The  median  age  of  the  U.S.  population  in  1980  was  30.0  years.  In  1991,  the  median  age  was  33.1 
years.  {Source:  Bureau  of  the  Census) 

a.  What  does  it  mean  for  the  median  age  to  rise? 

b.  Give  two  reasons  why  the  median  age  could  rise. 

c.  For  the  median  age  to  rise,  is  the  actual  number  of  children  less  in  1991  than  it  was  in  1980? 

Why  or  why  not? 


Exercise  2.13.20 

A  survey  was  conducted  of  130  purchasers  of  new  BMW  3  series  cars,  130  purchasers  of  new 
BMW  5  series  cars,  and  130  purchasers  of  new  BMW  7  series  cars.  In  it,  people  were  asked  the  age 
they  were  when  they  purchased  their  car.  The  following  box  plots  display  the  results. 
BMW  3  series 


BMW  5  series 


BMW  7  series 


25 


30 


35 


40 


45 


50 


55 


60 


65 


70 


75 


80 


a.  In  complete  sentences,  describe  what  the  shape  of  each  box  plot  implies  about  the  distribution 

of  the  data  collected  for  that  car  series. 

b.  Which  group  is  most  likely  to  have  an  outlier?  Explain  how  you  determined  that. 

c.  Compare  the  three  box  plots.  What  do  they  imply  about  the  age  of  purchasing  a  BMW  from  the 

series  when  compared  to  each  other? 

d.  Look  at  the  BMW  5  series.  Which  quarter  has  the  smallest  spread  of  data?  What  is  that  spread? 

e.  Look  at  the  BMW  5  series.  Which  quarter  has  the  largest  spread  of  data?  What  is  that  spread? 

f.  Look  at  the  BMW  5  series.  Estimate  the  Inter  Quartile  Range  (IQR). 

g.  Look  at  the  BMW  5  series.  Are  there  more  data  in  the  interval  31-38  or  in  the  interval  45-55? 

How  do  you  know  this? 

h.  Look  at  the  BMW  5  series.  Which  interval  has  the  fewest  data  in  it?  How  do  you  know  this? 

i.  31-35 

ii.  38-41 

iii.  41-64 
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Exercise  2.13.21  (Solution  on  p.  119.) 

The  following  box  plot  shows  the  U.S.  population  for  1990,  the  latest  available  year.  (Source: 
Bureau  of  the  Census,  1990  Census) 


0  17         33  50  »105 

a.  Are  there  fewer  or  more  children  (age  17  and  under)  than  senior  citizens  (age  65  and  over)? 

How  do  you  know? 

b.  12.6%  are  age  65  and  over.  Approximately  what  percent  of  the  population  are  of  working  age 

adults  (above  age  17  to  age  65)? 

Exercise  2.13.22 

Javier  and  Ercilia  are  supervisors  at  a  shopping  mall.  Each  was  given  the  task  of  estimating  the 
mean  distance  that  shoppers  live  from  the  mall.  They  each  randomly  surveyed  100  shoppers.  The 
samples  yielded  the  following  information: 


Javier 

Ercilia 

X 

6.0  miles 

6.0  miles 

s 

4.0  miles 

7.0  miles 

Table  2.23 


a.  How  can  you  determine  which  survey  was  correct  ? 

b.  Explain  what  the  difference  in  the  results  of  the  surveys  implies  about  the  data. 

c.  If  the  two  histograms  depict  the  distribution  of  values  for  each  supervisor,  which  one  depicts 

Ercilia's  sample?  How  do  you  know? 


i. 

Figure  2.2 


d.  If  the  two  box  plots  depict  the  distribution  of  values  for  each  supervisor,  which  one  depicts 
Ercilia's  sample?  How  do  you  know? 
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0  1  6  14  21  0         4         6  9  12 

Figure  2.3 


Exercise  2.13.23  (Solution  on  p.  119.) 

Student  grades  on  a  chemistry  exam  were: 

77,  78,  76, 81,  86,  51,  79, 82,  84,  99 

a.  Construct  a  stem-and-leaf  plot  of  the  data. 

b.  Are  there  any  potential  outliers?  If  so,  which  scores  are  they?  Why  do  you  consider  them 

outliers? 


2.13.1  Try  these  multiple  choice  questions  (Exercises  24  -  30). 

The  next  three  questions  refer  to  the  following  information.  We  are  interested  in  the  number  of  years 
students  in  a  particular  elementary  statistics  class  have  lived  in  California.  The  information  in  the  following 
table  is  from  the  entire  section. 


Number  of  years 

Frequency 

7 

1 

14 

3 

15 

1 

18 

1 

19 

4 

20 

3 

22 

1 

23 

1 

26 

1 

40 

2 

42 

2 

Total  =  20 

Table  2.24 


Exercise  2.13.24  (Solution  on  p.  119.) 

What  is  the  IQR? 

A.  8 
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B.  11 

C.  15 

D.  35 

Exercise  2.13.25 

What  is  the  mode? 

A.  19 

B.  19.5 

C.  14  and  20 

D.  22.65 


(Solution  on  p.  119.) 


Exercise  2.13.26 

Is  this  a  sample  or  the  entire  population? 

A.  sample 

B.  entire  popiilation 

C.  neither 


(Solution  on  p.  119.) 


The  next  two  questions  refer  to  the  following  table.  X  =  the  number  of  days  per  week  that  100  clients  use 
a  particular  exercise  facility. 


X 

Frequency 

0 

3 

1 

12 

2 

33 

3 

28 

4 

11 

5 

9 

6 

4 

Table  2.25 


Exercise  2.13.27 

The  80th  percentile  is: 

A.  5 

B.  80 

C.  3 

D.  4 


(Solution  on  p.  119.) 


Exercise  2.13.28 


(Solution  on  p.  119.) 


The  number  that  is  1.5  standard  deviations  BELOW  the  mean  is  approximately: 

A.  0.7 

B.  4.8 

C.  -2.8 

D.  Cannot  be  determined 
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The  next  two  questions  refer  to  the  following  histogram.  Suppose  one  hundred  eleven  people  who 
shopped  in  a  special  T-shirt  store  were  asked  the  number  of  T-shirts  they  own  costing  more  than  $19  each. 
Relative 
Frequency 


40/111 


30/111 


20/111 


10/111 


39/111 


23/1 1 1 


17/111 


5/111 


25/1 1 1 


2/111 


Number  of  T-shirts  costing  more  than  S19  each 

Exercise  2.13.29  (Solution  on  p.  119.) 

The  percent  of  people  that  own  at  most  three  (3)  T-shirts  costing  more  than  $19  each  is  approxi- 
mately: 

A.  21 

B.  59 

C.  41 

D.  Cannot  be  determined 

Exercise  2.13.30  (Solution  on  p.  119.) 

If  the  data  were  collected  by  asking  the  first  111  people  who  entered  the  store,  then  the  type  of 
sampling  is: 

A.  cluster 

B.  simple  random 

C.  stratified 

D.  convenience 


Exercise  2.13.31  (Solution  on  p.  119.) 

Below   are   the   2010    obesity   rates   by   U.S.    states    and   Washington,  DC.(Source; 

http://www.cdc.gov/obesity/data  /  adult.html)) 
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State 

Percent  (%) 

State 

Percent  (%) 

Alabama 

32.2 

Montana 

23.0 

Alaska 

24.5 

Nebraska 

26.9 

Arizona 

24.3 

Nevada 

22.4 

Arkansas 

30.1 

New  Hampshire 

25.0 

California 

24.0 

New  Jersey 

23.8 

Colorado 

21.0 

New  Mexico 

25.1 

Connecticut 

22.5 

New  York 

23.9 

Delaware 

28.0 

North  Carolina 

27.8 

Washington,  DC 

22.2 

North  Dakota 

27.2 

Florida 

26.6 

Ohio 

29.2 

Georgia 

29.6 

Oklahoma 

30.4 

Hawaii 

22.7 

Oregon 

26.8 

Idaho 

26.5 

Pennsylvania 

28.6 

Illinois 

28.2 

Rhode  Island 

25.5 

Indiana 

29.6 

South  Carolina 

31.5 

Iowa 

28.4 

South  Dakota 

27.3 

Kansas 

29.4 

Tennessee 

30.8 

Kentucky 

31.3 

Texas 

31.0 

Louisiana 

31.0 

Utah 

22.5 

Maine 

26.8 

Vermont 

23.2 

Maryland 

27.1 

Virginia 

26.0 

Massachusetts 

23.0 

Washington 

25.5 

Michigan 

30.9 

West  Virginia 

32.5 

Minnesota 

24.8 

Wisconsin 

26.3 

Mississippi 

34.0 

Wyoming 

25.1 

Missouri 

30.5 

Table  2.26 


a.  .  Construct  a  bar  graph  of  obesity  rates  of  your  state  and  the  four  states  closest  to  your  state. 

Hint:  Label  the  x-axis  with  the  states. 

b.  .  Use  a  random  number  generator  to  randomly  pick  8  states.  Construct  a  bar  graph  of  the  obesity 

rates  of  those  8  states. 

c.  Construct  a  bar  graph  for  all  the  states  beginning  with  the  letter  "A." 

d.  .  Construct  a  bar  graph  for  aU  the  states  beginning  with  the  letter  "M." 

Exercise  2.13.32  (Solution  on  p.  120.) 

A  music  school  has  budgeted  to  purchase  3  musical  instruments.  They  plan  to  purchase  a  piano 
costing  $3000,  a  guitar  costing  $550,  and  a  drum  set  costing  $600.  The  mean  cost  for  a  piano  is 
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$4,000  with  a  standard  deviation  of  $2,500.  The  mean  cost  for  a  guitar  is  $500  with  a  standard 
deviation  of  $200.  The  mean  cost  for  drums  is  $700  with  a  standard  deviation  of  $100.  Which  cost 
is  the  lowest,  when  compared  to  other  instruments  of  the  same  type?  Which  cost  is  the  highest 
when  compared  to  other  instruments  of  the  same  type.  Justify  your  answer  numerically. 

Exercise  2.13.33  (Solution  on  p.  120.) 

Suppose  that  a  publisher  conducted  a  survey  asking  adult  consumers  the  number  of  fiction  pa- 
perback books  they  had  purchased  in  the  previous  month.  The  results  are  summarized  in  the  table 
below.  (Note  that  this  is  the  data  presented  for  publisher  B  in  homework  exercise  13). 

Publisher  B 


#  of  books 

Freq. 

Rel.  Freq. 

0 

18 

1 

24 

2 

24 

3 

22 

4 

15 

5 

10 

7 

5 

9 

1 

Table  2.27 


a.  Are  there  any  outliers  in  the  data?  Use  an  appropriate  numerical  test  involving  the  IQR  to 
identify  outliers,  if  any,  and  clearly  state  your  conclusion. 

b.  If  a  data  value  is  identified  as  an  outlier,  what  should  be  done  about  it? 

c.  Are  any  data  values  further  than  2  standard  deviations  away  from  the  mean?  In  some  situ- 
ations, statisticians  may  use  this  criteria  to  identify  data  values  that  are  unusual,  compared 
to  the  other  data  values.  (Note  that  this  criteria  is  most  appropriate  to  use  for  data  that  is 
mound-shaped  and  symmetric,  rather  than  for  skewed  data.) 

d.  Do  parts  (a)  and  (c)  of  this  problem  give  the  same  answer? 

e.  Examine  the  shape  of  the  data.  Which  part,  (a)  or  (c),  of  this  question  gives  a  more  appropri- 
ate result  for  this  data? 

f.  Based  on  the  shape  of  the  data  which  is  the  most  appropriate  measure  of  center  for  this  data: 
mean,  median  or  mode? 

'Exercises  32  and  33  contributed  by  Roberta  Bloom 
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2.14  Lab:  Descriptive  Statistics^^ 

Class  Time: 
Names: 

2.14.1  Student  Learning  Outcomes 

•  The  student  will  construct  a  histogram  and  a  box  plot. 

•  The  student  will  calculate  univariate  statistics. 

•  The  student  will  examine  the  graphs  to  interpret  what  the  data  implies. 

2.14.2  Collect  the  Data 

Record  the  number  of  pairs  of  shoes  you  own: 

1.  Randomly  survey  30  classmates.  Record  their  values. 

Survey  Results 


Table  2.28 

2.  Construct  a  histogram.  Make  5-6  intervals.  Sketch  the  graph  using  a  ruler  and  pencil.  Scale  the  axes. 

^This  content  is  available  online  at  <http://cnx.org/content/ml6299/1.13/>. 
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Fiequeuc}' 


Nmuber  of  Paii  s 
of  Shoes 


Figure  2.4 


3.  Calculate  the  following: 

•  X  = 

•  s  = 

4.  Are  the  data  discrete  or  continuous?  How  do  you  know? 

5.  Describe  the  shape  of  the  histogram.  Use  complete  sentences. 

6.  Are  there  any  potential  outliers?  Which  value(s)  is  (are)  it  (they)?  Use  a  formula  to  check  the  end 
values  to  determine  if  they  are  potential  outliers. 


2.14.3  Analyze  the  Data 

1.  Determine  the  following: 

•  Minimum  value  = 

•  Median  = 

•  Maximum  value  ~ 

•  First  quartile  = 

•  Third  quartile  = 

•  IQR  = 

2.  Construct  a  box  plot  of  data 

3.  What  does  the  shape  of  the  box  plot  imply  about  the  concentration  of  data?  Use  complete  sentences. 

4.  Using  the  box  plot,  how  can  you  determine  if  there  are  potential  outliers? 

5.  How  does  the  standard  deviation  help  you  to  determine  concentration  of  the  data  and  whether  or  not 
there  are  potential  outliers? 

6.  What  does  the  IQR  represent  in  this  problem? 

7.  Show  your  work  to  find  the  value  that  is  1.5  standard  deviations: 

a.  Above  the  mean: 

b.  Below  the  mean: 
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Solutions  to  Exercises  in  Chapter  2 

Solution  to  Example  2.2,  Problem  (p.  61) 

The  value  12.3  may  be  an  outlier.  Values  appear  to  concentrate  at  3  and  4  kilometers. 


Stem 

Leaf 

1 

15 

2 

357 

3 

23358 

4 

025578 

5 

56 

6 

57 

7 

8 

9 

10 

11 

12 

3 

Table  2.29 


Solution  to  Example  2.7,  Problem  (p.  66) 

•  3.5  to  4.5 

•  4.5  to  5.5 

•  6 

•  5.5  to  6.5 

Solution  to  Example  2.9,  Problem  (p.  70) 
First  Data  Set 

•  Xmin  —  32 

•  Ql  =  56 

•  M  =  74.5 

•  Q3  =  82.5 

•  Xmax  —  99 

Second  Data  Set 

•  Xmin  —  25.5 

•  Ql  =  78 

•  M  =  81 

•  Q3  =  89 

•  Xmax  —  98 
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■■■■■■■■■ 

20     30     40     50     60     70     80     90  100 

Solution  to  Example  2.11,  Problem  (p.  72) 

For  the  IQRs,  see  the  answer  to  the  test  scores  example  (Solution  to  Example  2.9:  p.  114).  The  first  data  set 
has  the  larger  IQR,  so  the  scores  between  Q3  and  Ql  (middle  50%)  for  the  first  data  set  are  more  spread  out 
and  not  clustered  about  the  median. 

First  Data  Set 

•  (i)  ■  {IQR)  =  (i)  ■  (26.5)  =  39.75 

•  Xmax  -  Q3  =  99  -  82.5  =  16.5 

•  Ql  -  Xmin  =  56  -  32  =  24 

(i)  ■  (IQR)  ~  39.75  is  larger  than  16.5  and  larger  than  24,  so  the  first  set  has  no  outliers. 
Second  Data  Set 

•  (1)  ■  =  (i)  ■  (11)  =  16.5 

•  Xmax  -  Q3  =  98  -  89  =  9 

•  Ql  -  Xmin  =  78  -  25.5  =  52.5 

(i)  ■  (IQR)  ~  16.5  is  larger  than  9  but  smaller  than  52.5,  so  for  the  second  set  45  and  25.5  are  outliers. 

To  find  the  percentiles,  create  a  frequency,  relative  frequency,  and  cumulative  relative  frequency  chart  (see 
"Frequency"  from  the  Sampling  and  Data  Chapter  (Section  1.9)).  Get  the  percentiles  from  that  chart. 

First  Data  Set 

•  30th  %ile  (between  the  6th  and  7th  values)  =  M+_59)  =  57.5 

•  80th  %ile  (between  the  16th  and  17th  values)  =       +  ^^-^^  =  84.25 

Second  Data  Set 

•  30th  %ile  (7th  value)  =  78 

•  80th  %ile  (18th  value)  =  90 

30%  of  the  data  falls  below  the  30th  %ile,  and  20%  falls  above  the  80th  %ile. 
Solution  to  Example  2.13,  Problem  (p.  73) 

1.  (8  +  9)   ^  g3 

Look  where  cum.  rel.  freq.  =  0.80.  80%  of  the  data  is  8  or  less.  80th  %ile  is  between  the  last  8  and  first 
9. 

2.  9 

3.  6 

4.  First  Quartile  =  25th  %ile 
Solution  to  Exercise  2.6.1  (p.  75) 

a.  For  runners  in  a  race  it  is  more  desirable  to  have  a  low  percentile  for  finish  time.  A  low  percentile  means 
a  short  time,  which  is  faster. 
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b.  INTERPRETATION:  20%  of  runners  finished  the  race  in  5.2  minutes  or  less.  80%  of  runners  finished  the 

race  in  5.2  minutes  or  longer. 

c.  He  is  among  the  slowest  cyclists  (90%  of  cyclists  were  faster  than  him.)  INTERPRETATION:  90%  of 

cyclists  had  a  finish  time  of  1  hour,  12  minutes  or  less. Only  10%  of  cyclists  had  a  finish  time  of  1  hour, 
12  minutes  or  longer 

Solution  to  Exercise  2.6.2  (p.  75) 

a.  For  runners  in  a  race  it  is  more  desirable  to  have  a  high  percentile  for  speed.  A  high  percentile  means  a 

higher  speed,  which  is  faster. 

b.  INTERPRETATION:  40%  of  runners  ran  at  speeds  of  7.5  miles  per  hour  or  less  (slower).  60%  of  rvinners 

ran  at  speeds  of  7.5  miles  per  hour  or  more  (faster). 

Solution  to  Exercise  2.6.3  (p.  75) 

On  an  exam  you  would  prefer  a  high  percentile;  higher  percentiles  correspond  to  higher  grades  on  the 
exam. 

Solution  to  Exercise  2.6.4  (p.  75) 

When  waiting  in  line  at  the  DMV,  the  85th  percentile  would  be  a  long  wait  time  compared  to  the  other 
people  waiting.  85%  of  people  had  shorter  wait  times  than  you  did.  In  this  context,  you  woiild  prefer  a 
wait  time  corresponding  to  a  lower  percentile.  INTERPRETATION:  85%  of  people  at  the  DMV  waited  32 
minutes  or  less.  15%  of  people  at  the  DMV  waited  32  minutes  or  longer. 
Solution  to  Exercise  2.6.5  (p.  75) 

Li  should  be  pleased.  Her  salary  is  relatively  high  compared  to  other  recent  college  grads.  78%  of  recent 
college  graduates  earn  less  than  Li  does.  22%  of  recent  college  graduates  earn  more  than  Li  does. 
Solution  to  Exercise  2.6.6  (p.  75) 

The  manufacturer  and  the  consumer  would  be  upset.  This  is  a  large  repair  cost  for  the  damages,  compared 
to  the  other  cars  in  the  sample.  INTERPRETATION:  90%  of  the  crash  tested  cars  had  damage  repair  costs 
of  $1700  or  less;  only  10%  had  damage  repair  costs  of  $1700  or  more. 
Solution  to  Exercise  2.6.7  (p.  75) 

a.  The  top  12%  of  students  are  those  who  are  at  or  above  the  88th  percentile  of  admissions  index  scores. 

b.  The  top  4%  of  students'  GPAs  are  at  or  above  the  96th  percentile,  making  the  top  4%  of  students  "eligible 

in  the  local  context". 

Solution  to  Exercise  2.6.8  (p.  75) 

You  can  afford  34%  of  houses.  66%  of  the  houses  are  too  expensive  for  your  budget.  INTERPRETATION: 
34%  of  houses  cost  $240,000  or  less.  66%  of  houses  cost  $240,000  or  more. 

Solutions  to  Practice  1:  Center  of  the  Data 

Solution  to  Exercise  2.11.1  (p.  90) 

65 

Solution  to  Exercise  2.11.2  (p.  90) 

1 

Solution  to  Exercise  2.11.5  (p.  91) 

4.75 

Solution  to  Exercise  2.11.6  (p.  91) 

1.39 

Solution  to  Exercise  2.11.7  (p.  91) 

65 

Solution  to  Exercise  2.11.8  (p.  91) 

4 

Solution  to  Exercise  2.11.9  (p.  91) 

4 
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Solution  to  Exercise 

2.11.10 

(p. 

91) 

4 

Solution  to  Exercise 

4 

2.11.11 

(P- 

91) 

Solution  to  Exercise 

2.11.12 

(p. 

91) 

6 

Solution  to  Exercise 

2.11.13 

(p. 

91) 

6-4  =  2 

Solution  to  Exercise 

2.11.14 

(p- 

91) 

3 

Solution  to  Exercise 

2.11.15 

(p- 

91) 

6 

Solution  to  Exercise 

2.11.16 

(p- 

92) 

a.  8.93 

b.  0.58 

Solutions  to  Practice  2:  Spread  of  the  Data 


Solution  to  Exercise 

2.12.1 

(P- 

93) 

6 

Solution  to  Exercise 

2.12.2 

(P- 

93) 

a.  1447.5 

b.  528.5 

Solution  to  Exercise 

2.12.3 

(P- 

93) 

474  PTES 

Solution  to  Exercise 

2.12.4 

(P- 

93) 

50% 

Solution  to  Exercise 

2.12.5 

(P- 

93) 

919 

Solution  to  Exercise 

2.12.6 

(P- 

93) 

0.03 

Solution  to  Exercise 

2.12.7 

(P- 

94) 

mean  =  1809.3 

median  =  1812.5 

standard  deviation  = 

151.2 

First  quartile  =  1690 

Third  quartile  =  1935 

IQR  =  245 

Solution  to  Exercise 

2.12.9 

(P- 

94) 

Hint:  Think  about  the  number  of  years  covered  by  each  time  period  and  what  happened  to  higher  educa- 
tion during  those  periods. 

Solutions  to  Homework 
Solution  to  Exercise  2.13.1  (p.  95) 

a.  1.48 

b.  1.12 

e.  1 

f.  1 
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h.  0         12        3  4 

i.  80% 

]■•  1 

k.  3 

Solution  to  Exercise  2.13.3  (p.  95) 

a.  3.78 

b.  1.29 

e.  3 

f.  4 

g-  5   


h.  1  3      4     5  7 

i.  32.5% 

]•  4 
k.  5 

Solution  to  Exercise  2.13.5  (p.  97) 

b.  241 

c.  205.5 

d.  272.5 


e.  174        205.5  241     272.5  302 

f.  205.5,272.5 

g.  sample 

h.  population 

i.  i.  236.34 

ii.  37.50 

iii.  161.34 

iv.  0.84  std.  dev.  below  the  mean 
j.  Young 

Solution  to  Exercise  2.13.9  (p.  98) 

Kamala 

Solution  to  Exercise  2.13.15  (p.  103) 
a.  True 
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b.  True 

c.  True 

d.  False 

Solution  to  Exercise  2.13.17  (p.  104) 

b.  4,3,5 

c.  4 

d.  3 


e.  2      3      4       5  9 

f.  3,5 

g.  3.94 

h.  1.28 

i.  3 

j.  mode 

Solution  to  Exercise  2.13.19  (p.  105) 

c.  Maybe 

Solution  to  Exercise  2.13.21  (p.  106) 

a.  more  children 

b.  62.4% 

Solution  to  Exercise  2.13.23  (p.  107) 

b.  51,99 

Solution  to  Exercise  2.13.24  (p.  107) 

A 

Solution  to  Exercise  2.13.25  (p.  108) 

A 

Solution  to  Exercise  2.13.26  (p.  108) 

B 

Solution  to  Exercise  2.13.27  (p.  108) 

D 

Solution  to  Exercise  2.13.28  (p.  108) 

A 

Solution  to  Exercise  2.13.29  (p.  109) 

C 

Solution  to  Exercise  2.13.30  (p.  109) 

D 

Solution  to  Exercise  2.13.31  (p.  109) 

Example  solution  for  b  using  the  random  number  generator  for  the  Ti-84  Plus  to  generate  a  simple  random 
sample  of  8  states.  Instructions  are  below. 

Number  the  entries  in  the  table  1-51  (Includes  Washington,  DC;  Numbered  vertically) 
Press  MATH 
Arrow  over  to  PRB 
Press  5:randlnt( 
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Enter  51,1,8) 

Eight  numbers  are  generated  (use  the  right  arrow  key  to  scroll  through  the  numbers).  The 
numbers  correspond  to  the  numbered  states  (for  this  example:  {47  21  9  23  51  13  25  4}.  If 
any  numbers  are  repeated,  generate  a  different  number  by  using  5:randlnt(51,l)).  Here,  the 
states  (and  Washington  DC)  are  {Arkansas,  Washington  DC,  Idaho,  Maryland,  Michigan,  Missis- 
sippi, Virginia,  Wyoming}.      Corresponding  percents  are  {28.7  21.8  24.5  26  28.9  32.8  25  24.6}. 

40 
35 
30 
25 

Percent  20 
15 
10 

5 


 "Arkansas    Wash  DC    IdaliQ  Maryland  Michigan  Miijissippi  Virginia  WyonnirtE 

Solution  to  Exercise  2.13.32  (p.  110) 

For  pianos,  the  cost  of  the  piano  is  0.4  standard  deviations  BELOW  the  mean.  For  guitars,  the  cost  of  the 
guitar  is  0.25  standard  deviations  ABOVE  the  mean.  For  drums,  the  cost  of  the  drum  set  is  1.0  standard 
deviations  BELOW  the  mean.  Of  the  three,  the  drums  cost  the  lowest  in  comparison  to  the  cost  of  other 
instruments  of  the  same  type.  The  guitar  cost  the  most  in  comparison  to  the  cost  of  other  instruments  of  the 
same  type. 

Solution  to  Exercise  2.13.33  (p.  Ill) 

•  IQR  =  4  -  1  =  3  ;  Ql  -  1.5*IQR  =  1  - 1.5(3)  =  -3.5  ;  Q3  +  1.5*IQR  =  4  +  1.5(3)  =  8.5  ;The  data  value  of  9  is 
larger  than  8.5.  The  purchase  of  9  books  in  one  month  is  an  outlier. 

•  The  outlier  should  be  investigated  to  see  if  there  is  an  error  or  some  other  problem  in  the  data;  then  a 
decision  whether  to  include  or  exclude  it  should  be  made  based  on  the  particular  situation.  If  it  was 
a  correct  value  then  the  data  value  should  remain  in  the  data  set.  If  there  is  a  problem  with  this  data 
value,  then  it  should  be  corrected  or  removed  from  the  data.  For  example:  If  the  data  was  recorded 
incorrectly  (perhaps  a  9  was  miscoded  and  the  correct  value  was  6)  then  the  data  should  be  corrected. 
If  it  was  an  error  but  the  correct  value  is  not  known  it  should  be  removed  from  the  data  set. 

•  xbar  -  2s  =  2.45  -  2*1.88  =  -1.31 ;  xbar  +  2s  =  2.45  +  2*1.88  =  6.21 ;  Using  this  method,  the  five  data  values 
of  7  books  purchased  and  the  one  data  value  of  9  books  purchased  would  be  considered  unusual. 

•  No:  part  (a)  identifies  only  the  value  of  9  to  be  an  outlier  but  part  (c)  identifies  both  7  and  9. 

•  The  data  is  skewed  (to  the  right).  It  would  be  more  appropriate  to  use  the  method  involving  the  IQR 
in  part  (a),  identifying  only  the  one  value  of  9  books  purchased  as  an  outlier.  Note  that  part  (c)  remarks 
that  identifying  unusual  data  values  by  using  the  criteria  of  being  further  than  2  standard  deviations 
away  from  the  mean  is  most  appropriate  when  the  data  are  mound-shaped  and  symmetric. 

•  The  data  are  skewed  to  the  right.  For  skewed  data  it  is  more  appropriate  to  use  the  median  as  a 
measure  of  center. 
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Chapter  3 

Probability  Topics 


3.1  Probability  Topics^ 

3.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Understand  and  use  the  terminology  of  probability 

•  Determine  whether  two  events  are  mutually  exclusive  and  whether  two  events  are  independent. 

•  Calculate  probabilities  using  the  Addition  Rules  and  Multiplication  Rules. 

•  Construct  and  interpret  Contingency  Tables. 

•  Construct  and  interpret  Venn  Diagrams  (optional). 

•  Construct  and  interpret  Tree  Diagrams  (optional). 


3.1.2  Introduction 

It  is  often  necessary  to  "guess"  about  the  outcome  of  an  event  in  order  to  make  a  decision.  Politicians  study 
polls  to  guess  their  likelihood  of  winning  an  election.  Teachers  choose  a  particular  course  of  study  based 
on  what  they  think  students  can  comprehend.  Doctors  choose  the  treatments  needed  for  various  diseases 
based  on  their  assessment  of  likely  results.  You  may  have  visited  a  casino  where  people  play  games  chosen 
because  of  the  belief  that  the  likelihood  of  winning  is  good.  You  may  have  chosen  your  course  of  study 
based  on  the  probable  availability  of  jobs. 

You  have,  more  than  likely,  used  probability.  In  fact,  you  probably  have  an  intuitive  sense  of  probability. 
Probability  deals  with  the  chance  of  an  event  occiuring.  Whenever  you  weigh  the  odds  of  whether  or  not 
to  do  your  homework  or  to  study  for  an  exam,  you  are  using  probability.  In  this  chapter,  you  wiU  learn  to 
solve  probability  problems  using  a  systematic  approach. 

3.1.3  Optional  Collaborative  Classroom  Exercise 

Your  instructor  will  survey  your  class.  Count  the  nirmber  of  students  in  the  class  today. 

•  Raise  your  hand  if  you  have  any  change  in  your  pocket  or  purse.  Record  the  number  of  raised  hands. 

•  Raise  your  hand  if  you  rode  a  bus  within  the  past  month.  Record  the  number  of  raised  hands. 

•  Raise  your  hand  if  you  answered  "yes"  to  BOTH  of  the  first  two  questions.  Record  the  number  of 
raised  hands. 

^This  content  is  available  online  at  <http://caTx.org/content/ml6838/l.ll/>. 
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CHAPTERS.  PROBABILITY  TOPICS 


Use  the  class  data  as  estimates  of  the  following  probabilities.  P(change)  means  the  probability  that  a  ran- 
domly chosen  person  in  your  class  has  change  in  his/her  pocket  or  piirse.  P(bus)  means  the  probability  that 
a  randomly  chosen  person  in  yoiur  class  rode  a  bus  within  the  last  month  and  so  on.  Discuss  your  answers. 

•  Find  P(change). 

•  FindP(bus). 

•  Find  P(change  and  bus)  Find  the  probability  that  a  randomly  chosen  student  in  your  class  has  change 
in  his/her  pocket  or  piirse  and  rode  a  bus  within  the  last  month. 

•  Find  P(change  I  bus)  Find  the  probability  that  a  randomly  chosen  student  has  change  given  that 
he/she  rode  a  bus  within  the  last  month.  Count  all  the  students  that  rode  a  bus.  From  the  group 
of  students  who  rode  a  bus,  coimt  those  who  have  change.  The  probability  is  equal  to  those  who  have 
change  and  rode  a  bus  divided  by  those  who  rode  a  bus. 


3.2  Terminology^ 

Probability  is  a  measure  that  is  associated  with  how  certain  we  are  of  outcomes  of  a  particular  experiment 
or  activity.  An  experiment  is  a  planned  operation  carried  out  under  controlled  conditions.  If  the  result  is 
not  predetermined,  then  the  experiment  is  said  to  be  a  chance  experiment.  Flipping  one  fair  coin  twice  is 
an  example  of  an  experiment. 

The  result  of  an  experiment  is  called  an  outcome.  A  sample  space  is  a  set  of  all  possible  outcomes.  Three 
ways  to  represent  a  sample  space  are  to  list  the  possible  outcomes,  to  create  a  tree  diagram,  or  to  create  a 
Venn  diagram.  The  uppercase  letter  S  is  used  to  denote  the  sample  space.  For  example,  if  you  flip  one  fair 
coin,  S  —  {H,  T}  where  H  =  heads  and  T  =  tails  are  the  outcomes. 

An  event  is  any  combination  of  outcomes.  Upper  case  letters  like  A  and  B  represent  events.  For  example, 
if  the  experiment  is  to  flip  one  fair  coin,  event  A  might  be  getting  at  most  one  head.  The  probability  of  an 
event  A  is  written  P  (A). 

The  probability  of  any  outcome  is  the  long-term  relative  frequency  of  that  outcome.  Probabilities  are 
between  0  and  1,  inclusive  (includes  0  and  1  and  all  numbers  between  these  values).  P  (A)  —  0  means 
the  event  A  can  never  happen.  P  (A)  —  1  means  the  event  A  always  happens.  P  (A)  =  0.5  means  the 
event  A  is  equally  likely  to  occur  or  not  to  occur.  For  example,  if  you  flip  one  fair  coin  repeatedly  (from  20 
to  2,000  to  20,000  times)  the  relative  fequency  of  heads  approaches  0.5  (the  probability  of  heads). 

Equally  likely  means  that  each  outcome  of  an  experiment  occurs  with  equal  probability.  For  example,  if 
you  toss  a  fair,  six-sided  die,  each  face  (1,  2,  3,  4,  5,  or  6)  is  as  likely  to  occur  as  any  other  face.  If  you 
toss  a  fair  coin,  a  Head(H)  and  a  Tail(T)  are  equally  likely  to  occur.  If  you  randomly  guess  the  answer  to  a 
true/ false  question  on  an  exam,  you  are  equally  likely  to  select  a  correct  answer  or  an  incorrect  answer. 

To  calculate  the  probability  of  an  event  A  when  all  outcomes  in  the  sample  space  are  equally  likely, 

count  the  number  of  outcomes  for  event  A  and  divide  by  the  total  number  of  outcomes  in  the  sample  space. 
For  example,  if  you  toss  a  fair  dime  and  a  fair  nickel,  the  sample  space  is  {HH,  TH,  HT,  TT}  where  T  = 
tails  and  H  =  heads.  The  sample  space  has  four  outcomes.  A  =  getting  one  head.  There  are  two  outcomes 
{HT,  TH}.P{A)  =|. 

Suppose  you  roll  one  fair  six-sided  die,  with  the  numbers  {1,2,3,4,5,6}  on  its  faces.  Let  event  E  =  rolling  a 
number  that  is  at  least  5.  There  are  two  outcomes  {5,  6}.  P(£)  =5-11  you  were  to  roll  the  die  only  a  few 
times,  you  would  not  be  surprised  if  your  observed  results  did  not  match  the  probability.  If  you  were  to 
roll  the  die  a  very  large  number  of  times,  you  would  expect  that,  overall,  2/ 6  of  the  roUs  would  result  in  an 

^This  content  is  available  online  at  <http://cnx.org/content/ml6845/1.13/>. 
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outcome  of  "at  least  5".  You  would  not  expect  exactly  2/6.  The  long-term  relative  frequency  of  obtaining 
this  result  would  approach  the  theoretical  probability  of  2/ 6  as  the  niraiber  of  repetitions  grows  larger  and 
larger. 

This  important  characteristic  of  probability  experiments  is  the  known  as  the  Law  of  Large  Numbers:  as 
the  number  of  repetitions  of  an  experiment  is  increased,  the  relative  frequency  obtained  in  the  experiment 
tends  to  become  closer  and  closer  to  the  theoretical  probability.  Even  though  the  outcomes  don't  happen 
according  to  any  set  pattern  or  order,  overall,  the  long-term  observed  relative  frequency  will  approach  the 
theoretical  probability.  (The  word  empirical  is  often  used  instead  of  the  word  observed.)  The  Law  of  Large 
Numbers  will  be  discussed  again  in  Chapter  7. 

It  is  important  to  realize  that  in  many  situations,  the  outcomes  are  not  equally  likely.  A  coin  or  die  may 
be  unfair,  or  biased  .  Two  math  professors  in  Europe  had  their  statistics  students  test  the  Belgian  1  Euro 
coin  and  discovered  that  in  250  trials,  a  head  was  obtained  56%  of  the  time  and  a  tail  was  obtained  44% 
of  the  time.  The  data  seem  to  show  that  the  coin  is  not  a  fair  coin;  more  repetitions  would  be  helpful  to 
draw  a  more  accurate  conclusion  about  such  bias.  Some  dice  may  be  biased.  Look  at  the  dice  in  a  game  you 
have  at  home;  the  spots  on  each  face  are  usually  small  holes  carved  out  and  then  painted  to  make  the  spots 
visible.  Your  dice  may  or  may  not  be  biased;  it  is  possible  that  the  outcomes  may  be  affected  by  the  slight 
weight  differences  due  to  the  different  numbers  of  holes  in  the  faces.  Gambling  casinos  have  a  lot  of  money 
depending  on  outcomes  from  rolling  dice,  so  casino  dice  are  made  differently  to  eliminate  bias.  Casino  dice 
have  flat  faces;  the  holes  are  completely  filled  with  paint  having  the  same  density  as  the  material  that  the 
dice  are  made  out  of  so  that  each  face  is  equally  likely  to  occur.  Later  in  this  chapter  we  wiU  learn  techniques 
to  use  to  work  with  probabilities  for  events  that  are  not  equally  likely. 

"OR"  Event: 

An  outcome  is  in  the  event  A  OR  B  if  the  outcome  is  in  A  or  is  in  B  or  is  in  both  A  and  B.  For  example,  let 
A  =  {1,  2,  3,  4,  5}  and  B  =  {4,  5,  6,  7,  8}.  AORB  ^  {1,  2,  3,  4,  5,  6,  7,  8}.  Notice  that  4  and  5  are 
NOT  listed  twice. 

"AND"  Event: 

An  outcome  is  in  the  event  A  AND  B  if  the  outcome  is  in  both  A  and  B  at  the  same  time.  For  example,  let 
A  and  B  be  {1,  2,  3,  4,  5}  and  {4,  5,  6,  7,  8},  respectively  Then  A  AND  B  =  {4,5}. 

The  complement  of  event  A  is  denoted  A'  (read  "A  prime").  A'  consists  of  all  outcomes  that  are  NOT  in  A. 
Notice  that  P  (A)  +  P  {A')  =  1.  For  example,  let  S  =  {1,  2,  3,  4,  5,  6}  and  let  A  =  {1,  2,  3,  4}.  Then, 
A'  =  {5,  6}.P(A)         P(A')        andP{A)  +  P(A')  =t  +  i=  1 

The  conditional  probability  of  A  given  B  is  written  P(A|B).  P(A|B)  is  the  probability  that  event  A  wiU 
occur  given  that  the  event  B  has  already  occurred.  A  conditional  reduces  the  sample  space.  We  calculate 
the  probability  of  A  from  the  reduced  sample  space  B.  The  formula  to  calculate  P  (A|B)  is 

PiA\B)^'-(^ 

where  P  (B)  is  greater  than  0. 

For  example,  suppose  we  toss  one  fair,  six-sided  die.  The  sample  space  S  —  {1,  2,  3,  4,  5,  6}.  Let  A  = 
face  is  2  or  3  and  B  =  face  is  even  (2,  4,  6).  To  calculate  P  (A|B),  we  count  the  number  of  outcomes  2  or  3  in 
the  sample  space  B  —  {2,  4,  6}.  Then  we  divide  that  by  the  nvimber  of  outcomes  in  B  (and  not  S). 

We  get  the  same  result  by  using  the  formula.  Remember  that  S  has  6  outcomes. 

P  (  A\R\        F(A  and  B)         (the  number  of  outcomes  that  are  2  or  3  and  even  in  S)  /  6         1/6    1 

I  {/l\D)  —      p^gj      —         (the  number  of  outcomes  that  are  even  in  S)  /  6         ~  3/6  ~  3 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


124 


CHAPTERS.  PROBABILITY  TOPICS 


Understanding  Terminology  and  Symbols 

It  is  important  to  read  each  problem  carefully  to  think  about  and  understand  what  the  events  are.  Under- 
standing the  wording  is  the  first  very  important  step  in  solving  probability  problems.  Reread  the  problem 
several  times  if  necessary.  Clearly  identify  the  event  of  interest.  Determine  whether  there  is  a  condition 
stated  in  the  wording  that  would  indicate  that  the  probability  is  conditional;  carefully  identify  the  condi- 
tion, if  any. 

Exercise  3.2.1  (Solution  on  p.  160.) 

In  a  particular  college  class,  there  are  male  and  female  students.  Some  students  have  long  hair  and 
some  students  have  short  hair.  Write  the  symbols  for  the  probabilities  of  the  events  for  parts  (a) 
through  (j)  below.  (Note  that  you  can't  find  numerical  answers  here.  You  were  not  given  enough 
information  to  find  any  probability  values  yet;  concentrate  on  understanding  the  symbols.) 

•  Let  F  be  the  event  that  a  student  is  female. 

•  Let  M  be  the  event  that  a  student  is  male. 

•  Let  S  be  the  event  that  a  student  has  short  hair. 

•  Let  L  be  the  event  that  a  student  has  long  hair. 

a.  The  probability  that  a  student  does  not  have  long  hair. 

b.  The  probability  that  a  student  is  male  or  has  short  hair. 

c.  The  probability  that  a  student  is  a  female  and  has  long  hair. 

d.  The  probability  that  a  student  is  male,  given  that  the  student  has  long  hair. 

e.  The  probability  that  a  student  has  long  hair,  given  that  the  student  is  male. 

f .  Of  all  the  female  students,  the  probability  that  a  student  has  short  hair. 

g.  Of  all  students  with  long  hair,  the  probability  that  a  student  is  female. 

h.  The  probability  that  a  student  is  female  or  has  long  hair. 

i.  The  probability  that  a  randomly  selected  student  is  a  male  student  with  short  hair, 
j.  The  probability  that  a  student  is  female. 

**With  contributions  from  Roberta  Bloom 

3.3  Independent  and  Mutually  Exclusive  Events^ 

Independent  and  mutually  exclusive  do  not  mean  the  same  thing. 
3.3.1  Independent  Events 

Two  events  are  independent  if  the  following  are  true: 

•  P{A\B)  =  P{A) 

•  P{B\A)  =  P{B) 

•  P  (A  AND  B)  =  P  (A)  •  P  (B) 

Two  events  A  and  B  are  independent  if  the  knowledge  that  one  occurred  does  not  affect  the  chance  the 
other  occurs.  For  example,  the  outcomes  of  two  roles  of  a  fair  die  are  independent  events.  The  outcome 
of  the  first  roll  does  not  change  the  probability  for  the  outcome  of  the  second  roU.  To  show  two  events  are 
independent,  you  must  show  only  one  of  the  above  conditions.  If  two  events  are  NOT  independent,  then 
we  say  that  they  are  dependent. 

Sampling  may  be  done  with  replacement  or  without  replacement. 

^This  content  is  available  online  at  <http://caTx.org/content/ml6837/1.14/>. 
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•  With  replacement:  If  each  member  of  a  population  is  replaced  after  it  is  picked,  then  that  member  has 
the  possibility  of  being  chosen  more  than  once.  When  sampling  is  done  with  replacement,  then  events 
are  considered  to  be  independent,  meaning  the  result  of  the  first  pick  will  not  change  the  probabilities 
for  the  second  pick. 

•  Without  replacement::  When  sampling  is  done  without  replacement,  then  each  member  of  a  popu- 
lation may  be  chosen  only  once.  In  this  case,  the  probabilities  for  the  second  pick  are  affected  by  the 
result  of  the  first  pick.  The  events  are  considered  to  be  dependent  or  not  independent. 

If  it  is  not  known  whether  A  and  B  are  independent  or  dependent,  assume  they  are  dependent  until  you 
can  show  otherwise. 

3.3.2  Mutually  Exclusive  Events 

A  and  B  are  mutually  exclusive  events  if  they  cannot  occur  at  the  same  time.  This  means  that  A  and  B  do 
not  share  any  outcomes  and  P(A  AND  B)  —  0. 

For  example,  suppose  the  sample  space  S  =  {1,  2,  3,  4,  5,  6,  7,  8,  9,  10}.  Let 
A  =  {1,  2,  3,  4,  5},  B  =  {4,  5,  6,  7,  8},  and  C  =  {7,  9}.  AANDB  =  {4,5}.  P(AANDB)  = 
^and  is  not  equal  to  zero.  Therefore,  A  and  B  are  not  mutually  exclusive.  A  and  C  do  not  have  any 
numbers  in  common  so  P(A  AND  C)  =  0.  Therefore,  A  and  C  are  mutually  exclusive. 

If  it  is  not  known  whether  A  and  B  are  mutually  exclusive,  assume  they  are  not  until  you  can  show  other- 
wise. 

The  following  examples  illustrate  these  definitions  and  terms. 
Example  3.1 

Flip  two  fair  coins.  (This  is  an  experiment.) 

The  sample  space  is  {HH,  HT,  TH,  TT}  where  T  =  tails  and  H  =  heads.  The  outcomes  are  HH, 
HT,  TH,  and  TT.  The  outcomes  HT  and  TH  are  different.  The  HT  means  that  the  first  coin 
showed  heads  and  the  second  coin  showed  tails.  The  TH  means  that  the  first  coin  showed  tails 
and  the  second  coin  showed  heads. 

•  Let  A  =  the  event  of  getting  at  most  one  tail.  (At  most  one  tail  means  0  or  1  tail.)  Then  A  can 
be  written  as  {HH,  HT,  TH}.  The  outcome  HH  shows  0  tails.  HT  and  TH  each  show  1  tail. 

•  Let  B  =  the  event  of  getting  all  tails.  B  can  be  written  as  {TT}.  B  is  the  complement  of  A.  So, 

B  =  A'.  Also,  P  (A)  +  P(B)  =  P(A)  +  P(A')  =  1. 

•  The  probabilities  for  A  and  for  B  are  P  (A)  =  |  and  P  (B)  =  |. 

•  Let  C  =  the  event  of  getting  aU  heads.  C  =  {HH}.  Since  B  =  {TT},  P  (B  AND  C)  =  0. 
B  and  C  are  mutually  exclusive.  (B  and  C  have  no  members  in  common  because  you  cannot 
have  all  tails  and  all  heads  at  the  same  time.) 

•  Let  D  =  event  of  getting  more  than  one  tail.  D  —  {TT}.  P  (D)  —  \. 

•  Let  E  =  event  of  getting  a  head  on  the  first  roll.  (This  implies  you  can  get  either  a  head  or  tail 
on  the  second  roll.)  E  =  {HT,  HH}.  P  (E)  =  |. 

•  Find  the  probability  of  getting  at  least  one  (1  or  2)  tail  in  two  flips.  Let  F  =  event  of  getting 
at  least  one  tail  in  two  flips.  F  =  {HT,  TH,  TT}.  P(F)  =  | 

Example  3.2 

Roll  one  fair  6-sided  die.  The  sample  space  is  {1,  2,  3,  4,  5,  6}.  Let  event  A  =  a  face  is  odd.  Then 
A  =  {1,  3,  5}.  Let  event  B  =  a  face  is  even.  Then  B  =  {2,  4,  6}. 

•  Find  the  complement  of  A,  A'.  The  complement  of  A,  A',  is  B  because  A  and  B  together 
make  up  the  sample  space.  P(A)  +  P(B)  =  P(A)  +  P(A')  =  1.  Also,  P(A)  =  |  and  P(B)  =  | 
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•  Let  event  C  =  odd  faces  larger  than  2.  Then  C  —  {3,5}.  Let  event  D  =  all  even  faces  smaller 
than  5.  Then  D  =  {2, 4}.  P(C  and  D)  =  0  because  you  cannot  have  an  odd  and  even  face  at 
the  same  time.  Therefore,  C  and  D  are  mutually  exclusive  events. 

•  Let  event  E  =  all  faces  less  than  5.  E  =  {1, 2, 3, 4} . 

Problem  (Solution  on  p.  160.) 

Are  C  and  E  mutually  exclusive  events?  (Answer  yes  or  no.)  Why  or  why  not? 

•  Find  P(C  I  A).  This  is  a  conditional.  Recall  that  the  event  C  is  {3,  5}  and  event  A  is  {1,  3,  5}. 
To  find  P(C  I  A),  find  the  probability  of  C  using  the  sample  space  A.  You  have  reduced  the 
sample  space  from  the  original  sample  space  {1,  2,  3,  4,  5,  6}  to  {1,  3,  5}.  So,  P(C  I  A)  =  | 

Example  3.3 

Let  event  G  =  taking  a  math  class.  Let  event  H  =  taking  a  science  class.  Then,  G  AND  H  -  taking 
a  math  class  and  a  science  class.  Suppose  P(G)  =  0.6,  P(H)  =  0.5,  and  P(G  AND  H)  =  0.3.  Are 
G  and  H  independent? 

If  G  and  H  are  independent,  then  you  must  show  ONE  of  the  following: 

•  P(GIH)  =  P(G) 

•  P(HIG)  =  P(H) 

•  P(G  AND  H)  =  P(G)  •  P(H) 

NOTE:  The  choice  you  make  depends  on  the  information  you  have.  You  could  choose  any  of  the 
methods  here  because  you  have  the  necessary  information. 

Problem  1 

Show  that  P(G  I H)  =  P(G). 
Solution 

P(GIH)  =  P(^^yH)  =  03  ^  0.6  =  P(G) 
Problem  2 

Show  P(G  AND  H)  =  P(G)  •  P(H). 
Solution 

P  (G)  •  P  (H)  =  0.6  •  0.5  =  0.3  =  P(G  AND  H) 

Since  G  and  H  are  independent,  then,  knowing  that  a  person  is  taking  a  science  class  does  not 
change  the  chance  that  he  /  she  is  taking  math.  If  the  two  events  had  not  been  independent  (that 
is,  they  are  dependent)  then  knowing  that  a  person  is  taking  a  science  class  would  change  the 
chance  he/she  is  taking  math.  For  practice,  show  that  P(H  I G)  —  P(H)  to  show  that  G  and  H  are 
independent  events. 

Example  3.4 

In  a  box  there  are  3  red  cards  and  5  blue  cards.  The  red  cards  are  marked  with  the  numbers  1,  2, 
and  3,  and  the  blue  cards  are  marked  with  the  numbers  1, 2, 3, 4,  and  5.  The  cards  are  well-shuffled. 
You  reach  into  the  box  (you  carmot  see  into  it)  and  draw  one  card. 

Let  R  =  red  card  is  drawn,  B  =  blue  card  is  drawn,  E  =  even-numbered  card  is  drawn. 

The  sample  space  S  =  Rl,  R2,  R3,  Bl,  B2,  53,  B4,  B5.  S  has  8  outcomes. 

•  P(R)  =  |.  P(B)  =  |.  P(R  AND  B)  =  0.  (You  cannot  draw  one  card  that  is  both  red  and  blue.) 
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•  P(E)  =  |.  (There  are  3  even-numbered  cards,  R2,  B2,  and  B4.) 

•  P(E  I B)  =  |.  (There  are  5  blue  cards:  Bl,  B2,  B3,  B4,  and  B5.  Out  of  the  blue  cards,  there  are 
2  even  cards:  B2  and  B4.) 

•  P(B  I E)  =  |.  (There  are  3  even-numbered  cards:  R2,  B2,  and  B4.  Out  of  the  even-niimbered 
cards,  2  axe  blue:  B2  and  B4.) 

•  The  events  R  and  B  are  mutually  exclusive  because  P(R  AND  B)  =  0. 

•  Let  G  =  card  with  a  number  greater  than  3.  G  =  {B4,  B5}.  P(G)  =  g.  Let  H  =  blue  card 
numbered  between  1  and  4,  inclusive.  H  =  {Bl,  B2,  B3,  B4}.  P(G  I H)  =  |.  (The  only  card  in 
H  that  has  a  number  greater  than  3  is  B4.)  Since  g  =  |,  P(G)  =  P(G  I H)  which  means  that 
G  and  H  are  independent. 

Example  3.5 

In  a  particular  college  class,  60%  of  the  students  are  female.  50  %  of  all  students  in  the  class  have 
long  hair.  45%  of  the  students  are  female  and  have  long  hair.  Of  the  female  students,  75%  have 
long  hair.  Let  F  be  the  event  that  the  student  is  female.  Let  L  be  the  event  that  the  student  has 
long  hair.  One  student  is  picked  randomly.  Are  the  events  of  being  female  and  having  long  hair 
independent? 

•  The  following  probabilities  are  given  in  this  example: 

•  P(F)  =0.60;P(L)  =0.50 

•  P(FANDL)  =0.45 

•  P(LIF)  =0.75 

NOTE:  The  choice  you  make  depends  on  the  information  you  have.  You  could  use  the  first  or 
last  condition  on  the  list  for  this  example.  You  do  not  know  P(F  I L)  yet,  so  you  can  not  use  the 
second  condition. 

Solution  1 

Check  whether  P(F  and  L)  =  P(F)P(L):  We  are  given  that  P(F  and  L)  =  0.45  ;  but  P(F)P(L)  = 
(0.60)(0.50)=  0.30  The  events  of  being  female  and  having  long  hair  are  not  independent  because 
P(F  and  L)  does  not  equal  P(F)P(L). 

Solution  2 

check  whether  P(L  I F)  equals  P(L):  We  are  given  that  P(L  I F)  =  0.75  but  P(L)  =  0.50;  they  are  not 
equal.  The  events  of  being  female  and  having  long  hair  are  not  independent. 

Interpretation  of  Results 

The  events  of  being  female  and  having  long  hair  are  not  independent;  knowing  that  a  student  is 
female  changes  the  probability  that  a  student  has  long  hair. 

**Example  5  contributed  by  Roberta  Bloom 

3.4  Two  Basic  Rules  of  Probability^ 
3.4.1  The  Multiplication  Rule 

If  A  and  B  are  two  events  defined  on  a  sample  space,  then:  P(A  AND  B)  =  P(B)  •  P(A  I B). 
This  rule  may  also  be  written  as :  P  (A|B)  =  p(^\' 


*This  content  is  available  online  at  <http: / /cnx.org/content/ml6847/l.ll/>. 
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(The  probability  of  A  given  B  equals  the  probability  of  A  and  B  divided  by  the  probability  of  B.) 

If  A  and  B  are  independent,  then  P(A  I B)  =  P(A).  Then  P(A  AND  B)  =  P(A  I B)  P(B)  becomes 
P(AAND  B)  =  P(A)  P(B). 

3.4.2  The  Addition  Rule 

If  A  and  B  are  defined  on  a  sample  space,  then:  P(A  OR  B)  =  P(A)  +  P(B)  -  P(A  AND  B). 

If  A  and  B  are  mutually  exclusive,  then  P(A  AND  B)  =  0.  Then  P(A  OR  B)  =  P(A)  +  P(B)  -  P(A  AND  B) 
becomes  P(A  OR  B)  =  P(A)  +  P(B). 
Example  3.6 

Klaus  is  trying  to  choose  where  to  go  on  vacation.  His  two  choices  are:  A  =  New  Zealand  and  B 
=  Alaska 

•  Klaus  can  only  afford  one  vacation.  The  probability  that  he  chooses  A  is  P(A)  —  0.6  and  the 

probability  that  he  chooses  B  is  P(B)  =  0.35. 

•  P(A  and  B)  =  0  because  Klaus  can  only  afford  to  take  one  vacation 

•  Therefore,  the  probability  that  he  chooses  either  New  Zealand  or  Alaska  is  P(A  OR  B)  — 
P(A)  +  P(B)  =  0.6  +  0.35  =  0.95.  Note  that  the  probability  that  he  does  not  choose  to  go 
an5rwhere  on  vacation  must  be  0.05. 

Example  3.7 

Carlos  plays  college  soccer.  He  makes  a  goal  65%  of  the  time  he  shoots.  Carlos  is  going  to  attempt 
two  goals  in  a  row  in  the  next  game. 

A  =  the  event  Carlos  is  successful  on  his  first  attempt.  P(A)  =  0.65.  B  =  the  event  Carlos  is 
successful  on  his  second  attempt.  P(B)  =  0.65.  Carlos  tends  to  shoot  in  streaks.  The  probability 
that  he  makes  the  second  goal  GIVEN  that  he  made  the  first  goal  is  0.90. 

Problem  1 

What  is  the  probability  that  he  makes  both  goals? 
Solution 

The  problem  is  asking  you  to  find  P(A  AND  B)  =  P(B  AND  A).  Since  P(B  I  A)  =  0.90: 

P(B  AND  A)  =  P(B  I  A)  P(A)  =  0.90  *  0.65  =  0.585  (3.1) 
Carlos  makes  the  first  and  second  goals  with  probability  0.585. 

Problem  2 

What  is  the  probability  that  Carlos  makes  either  the  first  goal  or  the  second  goal? 
Solution 

The  problem  is  asking  you  to  find  P(A  OR  B). 

P(A  OR  B)  =  P(A)  +  P(B)  -  P(A  AND  B)  =  0.65  +  0.65  -  0.585  =  0.715  (3.2) 
Carlos  makes  either  the  first  goal  or  the  second  goal  with  probability  0.715. 

Problem  3 

Are  A  and  B  independent? 
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Solution 

No,  they  are  not,  because  P(B  AND  A)  =  0.585. 

P(B)  •  P(A)  =  (0.65)  •  (0.65)  ^  0.423  (3.3) 

0.423  7^  0.585  =  P(BANDA)  (3.4) 
So,  P(B  AND  A)  is  not  equal  to  P(B)  •  P(A). 

Problem  4 

Are  A  and  B  mutually  exclusive? 
Solution 

No,  they  are  not  because  P(A  and  B)  =  0.585. 

To  be  mutually  exclusive,  P(A  AND  B)  must  equal  0. 


Example  3.8 

A  community  swim  team  has  150  members.  Seventy-five  of  the  members  are  advanced  swim- 
mers. Forty-seven  of  the  members  are  intermediate  swimmers.  The  remainder  are  novice  swim- 
mers. Forty  of  the  advanced  swimmers  practice  4  times  a  week.  Thirty  of  the  intermediate  swim- 
mers practice  4  times  a  week.  Ten  of  the  novice  swimmers  practice  4  times  a  week.  Suppose  one 
member  of  the  swim  team  is  randomly  chosen.  Answer  the  questions  (Verify  the  answers): 

Problem  1 

What  is  the  probability  that  the  member  is  a  novice  swimmer? 
Solution 

28 
150 


Problem  2 

What  is  the  probability  that  the  member  practices  4  times  a  week? 
Solution 

80 
150 


Problem  3 

What  is  the  probability  that  the  member  is  an  advanced  swimmer  and  practices  4  times  a  week? 

Solution 

40 
150 


Problem  4 

What  is  the  probability  that  a  member  is  an  advanced  swimmer  and  an  intermediate  swimmer? 
Are  being  an  advanced  swimmer  and  an  intermediate  swimmer  mutually  exclusive?  Why  or  why 
not? 

Solution 

P(advanced  AND  intermediate)  —  0,  so  these  are  mutually  exclusive  events.  A  swimmer  cannot 
be  an  advanced  swimmer  and  an  intermediate  swimmer  at  the  same  time. 
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Problem  5 

Are  being  a  novice  swimmer  and  practicing  4  times  a  week  independent  events?  Why  or  why 
not? 


Solution 

No,  these  are  not  independent  events. 


P(novice  AND  practices  4  times  per  week)  —  0.0667  (3.5) 
P(novice)  •  P(practices  4  times  per  week)  —  0.0996  (3.6) 
0.0667     0.0996  (3.7) 


Example  3.9 

Studies  show  that,  if  she  lives  to  be  90,  about  1  woman  in  7  (approximately  14.3%)  will  develop 
breast  cancer.  Suppose  that  of  those  women  who  develop  breast  cancer,  a  test  is  negative  2%  of  the 
time.  Also  suppose  that  in  the  general  population  of  women,  the  test  for  breast  cancer  is  negative 
about  85%  of  the  time.  Let  B  -  woman  develops  breast  cancer  and  let  N  -  tests  negative.  Suppose 
one  woman  is  selected  at  random. 

Problem  1 

What  is  the  probability  that  the  woman  develops  breast  cancer?  What  is  the  probability  that 
woman  tests  negative? 

Solution 

P(B)=  0.143 ;  P(N)  =  0.85 


Problem  2 

Given  that  the  woman  has  breast  cancer,  what  is  the  probability  that  she  tests  negative? 

Solution 
P(N  I B)  =  0.02 


Problem  3 

What  is  the  probability  that  the  woman  has  breast  cancer  AND  tests  negative? 
Solution 

P(B  AND  N)  =  P(B)  •  P(N  I B)  =  (0.143)  •  (0.02)  =  0.0029 


Problem  4 

What  is  the  probability  that  the  woman  has  breast  cancer  or  tests  negative? 
Solution 

P(B  OR  N)     P(B)  +  P(N)  -  P(B  AND  N)  =  0.143  +  0.85  -  0.0029  =  0.9901 
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Problem  5 

Are  having  breast  cancer  and  testing  negative  independent  events? 
Solution 

No.  P(N)  =  0.85;  P(N  I B)  =  0.02.  So,  P(N  I B)  does  not  equal  P(N) 
Problem  6 

Are  having  breast  cancer  and  testing  negative  mutually  exclusive? 
Solution 

No.  P(B  AND  N)  =  0.0029.  For  B  and  N  to  be  mutually  exclusive,  P(B  AND  N)  must  be  0. 


3.5  Contingency  Tables^ 

A  contingency  table  provides  a  way  of  portraying  data  that  can  facilitate  calculating  probabilities.  The  table 
helps  in  determining  conditional  probabilities  quite  easily.  The  table  displays  sample  values  in  relation 
to  two  different  variables  that  may  be  dependent  or  contingent  on  one  another.  Later  on,  we  will  use 
contingency  tables  again,  but  in  another  manner.  Contingincy  tables  provide  a  way  of  portraying  data  that 
can  facilitate  calculating  probabilities. 

Example  3.10 

Suppose  a  study  of  speeding  violations  and  drivers  who  use  car  phones  produced  the  following 
fictional  data: 


Speeding  violation  in 
the  last  year 

No  speeding  violation 
in  the  last  year 

Total 

Car  phone  user 

25 

280 

305 

Not  a  car  phone  user 

45 

405 

450 

Total 

70 

685 

755 

Table  3.1 


The  total  number  of  people  in  the  sample  is  755.  The  row  totals  are  305  and  450.  The  column  totals 
are  70  and  685.  Notice  that  305  +  450  =  755  and  70  +  685  =  755. 

Calculate  the  following  probabilities  using  the  table 
Problem  1 

P(person  is  a  car  phone  user)  = 
Solution 

number  of  car  phone  users    305 

total  number  in  study  755 

Problem  2 

P(person  had  no  violation  in  the  last  year)  = 
^This  content  is  available  online  at  <http://cnx.org/content/ml6835/1.12/>. 
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Solution 

number  that  had  no  violation    685 

total  number  in  study  755 


Problem  3 

P(person  had  no  violation  in  the  last  year  AND  was  a  car  phone  user) : 
Solution 

280 
755 


Problem  4 

P(person  is  a  car  phone  user  OR  person  had  no  violation  in  the  last  year)  = 

Solution 

305   ,  685  \      280  _  710 


(305  I  685\ 
\755  "I"  755^ 


755  ~  755 


Problem  5 

P(person  is  a  car  phone  user  GIVEN  person  had  a  violation  in  the  last  year)  = 
Solution 

^  (The  sample  space  is  reduced  to  the  number  of  persons  who  had  a  violation.) 
Problem  6 

P(person  had  no  violation  last  year  GIVEN  person  was  not  a  car  phone  user)  = 
Solution 

^  (The  sample  space  is  reduced  to  the  niraiber  of  persons  who  were  not  car  phone  users.) 


Example  3.11 

The  following  table  shows  a  random  sample  of  100  hikers  and  the  areas  of  hiking  preferred: 


Hiking  Area  Preference 


Sex 

The  Coastline 

Near  Lakes  and  Streams 

On  Mountain  Peaks 

Total 

Female 

18 

16 

45 

Male 

14 

55 

Total 

41 

Table  3.2 


(Solution  on  p.  160.) 


Problem  1 
Complete  the  table. 

Problem  2  (Solution  on  p.  160.) 

Are  the  events  "being  female"  and  "preferring  the  coastHne"  independent  events? 

Let  F  =  being  female  and  let  C  =  preferring  the  coastline, 
a.  P(F  AND  C)  = 
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b.  P{F)  ■  P  (C)  = 

Are  these  two  numbers  the  same?  If  they  are,  then  f  and  C  are  independent.  If  they  are  not,  then 
F  and  C  are  not  independent. 

Problem  3  (Solution  on  p.  160.) 

Find  the  probability  that  a  person  is  male  given  that  the  person  prefers  hiking  near  lakes  and 
streams.  Let  M  =  being  male  and  let  L  =  prefers  hiking  near  lakes  and  streams. 

a.  What  word  tells  you  this  is  a  conditional? 

b.  Fill  in  the  blanks  and  calculate  the  probability:  P(  I  )  —   . 

c.  Is  the  sample  space  for  this  problem  all  100  hikers?  If  not,  what  is  it? 

Problem  4  (Solution  on  p.  160.) 

Find  the  probability  that  a  person  is  female  or  prefers  hiking  on  moimtain  peaks.  Let  F  =  being 
female  and  let  P  =  prefers  mountain  peaks. 

a.  P(F)  = 

b.  P(P)  = 

c.  P(FANDP)  = 

d.  TheiefoiB,  P(F  OR  P)  = 


Example  3.12 

Muddy  Mouse  lives  in  a  cage  with  3  doors.  If  Muddy  goes  out  the  first  door,  the  probability  that 
he  gets  caught  by  Alissa  the  cat  is  j  and  the  probability  he  is  not  caught  is  5.  If  he  goes  out  the 
second  door,  the  probability  he  gets  caught  by  Alissa  is  j  and  the  probability  he  is  not  caught  is  |. 
The  probability  that  Alissa  catches  Muddy  coming  out  of  the  third  door  is  j  and  the  probability 
she  does  not  catch  Muddy  is  ^  •  It  is  equally  likely  that  Muddy  will  choose  any  of  the  three  doors 
so  the  probability  of  choosing  each  door  is  j. 

Door  Choice 


Caught  or  Not 


Caught 


Not  Caught 


Total 


Door  One    Door  Two 


15 


12 


15 


12 


Door  Three 


Total 


Table  3.3 


The  first  entry  i  =  (^1^  (^i^  is  P (Door  One  AND  Caught). 
The  entry  ^  =  (^|)  (^i)  is  P(Door  One  AND  Not  Caught). 


4 
^5 

Verify  the  remaining  entries. 

Problem  1  (Solution  on  p.  160.) 

Complete  the  probability  contingency  table.  Calculate  the  entries  for  the  totals.  Verify  that  the 
lower-right  comer  entry  is  1. 

Problem  2 

What  is  the  probability  that  Alissa  does  not  catch  Muddy? 
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Solution 

41 
60 


Problem  3 

What  is  the  probability  that  Muddy  chooses  Door  One  OR  Door  Two  given  that  Muddy  is  caught 
by  Alissa? 

Solution 

_9_ 
19 


NOTE:  You  could  also  do  this  problem  by  using  a  probability  tree.  See  the  Tree  Diagrams  (Op- 
tional) (Section  3.7)  section  of  this  chapter  for  examples. 


3.6  Venn  Diagrams  (optional)^ 

A  Venn  diagram  is  a  picture  that  represents  the  outcomes  of  an  experiment.  It  generally  consists  of  a  box 
that  represents  the  sample  space  S  together  with  circles  or  ovals.  The  circles  or  ovals  represent  events. 

Example  3.13 

Suppose  an  experiment  has  the  outcomes  1,  2,  3,  ...  ,  12  where  each  outcome  has  an  equal  chance 
of  occurring.  Let  event  A  =  {1,  2,  3, 4,  5,  6}  and  event  B  =  {6,  7,  8, 9}.  Then  A  AND  B  =  {6} 
and  A  OR  B  =  {1,  2,  3, 4,  5, 6,  7, 8,  9}.  The  Venn  diagram  is  as  follows: 


Example  3.14 

Flip  2  fair  coins.  Let  A  =  tails  on  the  first  coin.  Let  B  =  tails  on  the  second  coin.  Then  A  —  {TT,  TH} 
and  B  =  {TT,HT}.  Therefore,  A  AND  B  =  {TTj.AORB  =  {TH,TT,HT}. 

The  sample  space  when  you  flip  two  fair  coins  is  S  =  {HH,  HT,  TH,  TT}.  The  outcome  HH  is  in 
neither  A  nor  B.  The  Venn  diagram  is  as  follows: 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6848/l.12/>. 
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s 


HH 

Example  3.15 

Forty  percent  of  the  students  at  a  local  college  belong  to  a  club  and  50%  work  part  time.  Five 
percent  of  the  students  work  part  time  and  belong  to  a  club.  Draw  a  Venn  diagram  showing  the 
relationships.  Let  C  =  student  belongs  to  a  club  and  PT  =  student  works  part  time. 


If  a  student  is  selected  at  random  find 

•  The  probability  that  the  student  belongs  to  a  club.  P(C)  =  0.40. 

•  The  probability  that  the  student  works  part  time.  P(PT)  =  0.50. 

•  The  probability  that  the  student  belongs  to  a  club  AND  works  part  time.  P(C  AND  PT) 
0.05. 

•  The  probability  that  the  student  belongs  to  a  club  given  that  the  student  works  part  time. 


P(C  I  PT) 


P(C  AND  PT) 


0.05 
050 


0.1 


P(PT) 

The  probability  that  the  student  belongs  to  a  club  OR  works  part  time. 

P(CORPT)  =  P(C)  +  P(PT)  -  P(CANDPT)  =  0.40  +  0.50  -  0.05  =  0.85 


(3.8) 


(3.9) 


3.7  Tree  Diagrams  (optional)^ 

A  tree  diagram  is  a  special  type  of  graph  used  to  determine  the  outcomes  of  an  experiment.  It  consists  of 
"branches"  that  are  labeled  with  either  frequencies  or  probabilities.  Tree  diagrams  can  make  some  probabil- 
ity problems  easier  to  visualize  and  solve.  The  following  example  illustrates  how  to  use  a  tree  diagram. 

''This  content  is  available  online  at  <http://cnx.org/content/ml6846/1.10/>. 
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Example  3.16 

In  an  urn,  there  are  11  balls.  Three  balls  are  red  (R)  and  8  balls  are  blue  (B).  Draw  two  balls,  one 
at  a  time,  with  replacement.  "With  replacement"  means  that  you  put  the  first  ball  back  in  the  urn 
before  you  select  the  second  ball.  The  tree  diagram  using  frequencies  that  show  all  the  possible 
outcomes  follows. 


SB 

l*Draw 

SB / 

\jR  SB 

64BB 

24B  R  24ES 

9RR 

Figure  3.1:  Total  =  64  +  24  +  24  +  9  =  121 


The  first  set  of  branches  represents  the  first  draw.  The  second  set  of  branches  represents  the  second 
draw.  Each  of  the  outcomes  is  distinct.  In  fact,  we  can  list  each  red  ball  as  Rl,  R2,  and  R3  and  each 
blue  baU  as  Bl,  B2,  B3,  B4,  B5,  B6,  B7,  and  B8.  Then  the  9  RR  outcomes  can  be  written  as: 

RlRl;  R1R2;  R1R3;  RlRl;  R2R2;  R2R3;  R3R1;  R3R2;  R3R3 

The  other  outcomes  are  similar. 

There  are  a  total  of  11  balls  in  the  urn.  Draw  two  baUs,  one  at  a  time,  and  with  replacement.  There 
are  11  •  11  =  121  outcomes,  the  size  of  the  sample  space. 

Problem  1  (Solution  on  p.  161.) 

List  the  24  BR  outcomes:  BlRl,  B1R2,  B1R3, ... 

Problem  2 

Using  the  tree  diagram,  calculate  P(RR). 
Solution 

P(RR)  —  n  '  IT  ~  121 


Problem  3 

Using  the  tree  diagram,  calculate  P(RB  OR  BR). 
Solution 

P(RBORBR)  =  n-  n+  n-  n=  m 
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Problem  4 

Using  the  tree  diagram,  calciilate  P(R  on  1st  draw  AND  B  on  2nd  draw). 
Solution 

P(R  on  1st  draw  AND  B  on  2nd  draw)  =  P(RB)  =  ^  •  ^  =  ^ 
Problem  5 

Using  the  tree  diagram,  calculate  P(R  on  2nd  draw  given  B  on  1st  draw). 
Solution 

P(R  on  2nd  draw  given  B  on  1st  draw)  =  P(R  on  2nd  I  B  on  1st)  =  gg  =  ^ 

This  problem  is  a  conditional.  The  sample  space  has  been  reduced  to  those  outcomes  that  already 
have  a  blue  on  the  first  draw.  There  are  24  +  64  —  88  possible  outcomes  (24  BR  and  64  BB). 
Twenty-four  of  the  88  possible  outcomes  are  BR.  gg  =  n- 

Problem  6  (Solution  on  p.  161.) 

Using  the  tree  diagram,  calculate  P(BB). 

Problem  7  (Solution  on  p.  161.) 

Using  the  tree  diagram,  calculate  P(B  on  the  2nd  draw  given  R  on  the  first  draw). 

Example  3.17 

An  urn  has  3  red  marbles  and  8  blue  marbles  in  it.  Draw  two  marbles,  one  at  a  time,  this  time 
without  replacement  from  the  urn.  "Without  replacement"  means  that  you  do  not  put  the  first 
ball  back  before  you  select  the  second  ball.  Below  is  a  tree  diagram.  The  branches  are  labeled  with 
probabilities  instead  of  frequencies.  The  numbers  at  the  ends  of  the  branches  are  calculated  by 
miiltiplying  the  numbers  on  the  two  corresponding  branches,  for  example,  n  '  TO  ~  ifo- 


56  24     24  6 


110  110    110  110 

BB  BR     RB  RR 


Figure  3.2:  Total   =  ^6  + 24 +  24 +  6  =  ^  =  i 
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NOTE:  If  you  draw  a  red  on  the  first  draw  from  the  3  red  possibilities,  there  are  2  red  left  to  draw 
on  the  second  draw.  You  do  not  put  back  or  replace  the  first  ball  after  you  have  drawn  it.  You 
draw  without  replacement,  so  that  on  the  second  draw  there  are  10  marbles  left  in  the  urn. 

Calculate  the  following  probabilities  using  the  tree  diagram. 

Problem  1 

P(RR)  = 

Solution 

P(RR)  =  n  '  10  ~  TTD 


-)(- 


--'  ~  110 


Problem  2 

FUl  in  the  blanks: 

P(RB  OR  BR)  =  ^  •  ^  +  (_ 

Problem  3 

P(R  on  2d  I  B  on  1st)  = 

Problem  4 

Fill  in  the  blanks: 


P(R  on  1st  and  B  on  2nd)  =  P(RB)  =  (  )(  )  = 

Problem  5 
P(BB)  = 


24 
110 


(Solution  on  p.  161.) 

(Solution  on  p.  161.) 
(Solution  on  p.  161.) 

(Solution  on  p.  161.) 


Problem  6 

P(B  on  2nd  I  R  on  1st)  = 
Solution 

There  are  6  +  24  outcomes  that  have  R  on  the  first  draw  (6  RR  and  24  RB).  The  6  and  the  24 
are  frequencies.  They  are  also  the  numerators  of  the  fractions  jfo  ^'^d  TW-  sample  space  is  no 
longer  110  but  6  +  24  —  30.  Twenty-four  of  the  30  outcomes  have  B  on  the  second  draw.  The 
probability  is  then      Did  you  get  this  answer? 


If  we  are  using  probabilities,  we  can  label  the  tree  in  the  following  general  way. 
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3.8  Summary  of  Formulas^ 

Formula  3.1:  Complement 

If  A  and  A'  are  complements  then  P  (A)  +  P(A' )  =  1 

Formula  3.2:  Addition  Rule 

P(A  OR  B)  =  P(A)  +  P(B)  -  P(A  AND  B) 

Formula  3.3:  Mutually  Exclusive 

If  A  and  B  are  mutually  exclusive  then  P(A  AND  B)  =  0  ;  so  P(A  OR  B)  =  P(A)  +  P(B). 
Formula  3.4:  Multiplication  Rule 

•  P(A  AND  B)  =  P(B)P(A  I  B) 

•  P(A  AND  B)  =  P(A)P(B  I  A) 

Formula  3.5:  Independence 

If  A  and  B  are  independent  then: 

•  P(A  i  B)  =  P(A) 

•  P(B  I  A)  =  P(B) 

•  P(A  AND  B)  =  P(A)P(B) 


*This  content  is  available  online  at  <http://cnx.org/content/ml6843/1.5/>. 
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3.9  Practice  1:  Contingency  Tables^ 

3.9.1  Student  Learning  Outcomes 

•  The  student  will  construct  and  interpret  contingency  tables. 


3.9.2  Given 

An  article  in  the  New  England  Journal  of  Medicine  ,  reported  about  a  study  of  smokers  in  California  and 
Hawaii.  In  one  part  of  the  report,  the  self-reported  ethnicity  and  smoking  levels  per  day  were  given.  Of  the 
people  smoking  at  most  10  cigarettes  per  day,  there  were  9886  African  Americans,  2745  Native  Hawaiians, 
12,831  Latinos,  8378  Japanese  Americans,  and  7650  Whites.  Of  the  people  smoking  11-20  cigarettes  per 
day,  there  were  6514  African  Americans,  3062  Native  Hawaiians,  4932  Latinos,  10,680  Japanese  Americans, 
and  9877  Whites.  Of  the  people  smoking  21-30  cigarettes  per  day,  there  were  1671  African  Americans,  1419 
Native  Hawaiians,  1406  Latinos,  4715  Japanese  Americans,  and  6062  Whites.  Of  the  people  smoking  at  least 
31  cigarettes  per  day,  there  were  759  African  Americans,  788  Native  Hawaiians,  800  Latinos,  2305  Japanese 
Americans,  and  3970  Whites.  {(Source:  http://www.neim.org/doi/full/10.1056/NEJMoa033250)) 

3.9.3  Complete  the  Table 

Complete  the  table  below  using  the  data  provided. 


Smoking  Levels  by  Ethnicity 


Smoking 
Level 

African 
American 

Native 
Hawaiian 

Latino 

Japanese 
Americans 

White 

TOTALS 

1-10 

11-20 

21-30 

31+ 

TOTALS 

Table  3.4 


3.9.4  Analyze  the  Data 

Suppose  that  one  person  from  the  study  is  randomly  selected. 

Exercise  3.9.1  (Solution  on  p.  161.) 

Find  the  probability  that  person  smoked  11-20  cigarettes  per  day. 

Exercise  3.9.2  (Solution  on  p.  161.) 

Find  the  probability  that  person  was  Latino. 


^This  content  is  available  online  at  <http://caTx.org/content/ml6839/l.ll/>. 
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3.9.5  Discussion  Questions 

Exercise  3.9.3  (Solution  on  p.  161.) 

In  words,  explain  what  it  means  to  pick  one  person  from  the  study  and  that  person  is  "Japanese 
American  AND  smokes  21-30  cigarettes  per  day."  Also,  find  the  probability. 

Exercise  3.9.4  (Solution  on  p.  161.) 

In  words,  explain  what  it  means  to  pick  one  person  from  the  study  and  that  person  is  "Japanese 
American  OR  smokes  21-30  cigarettes  per  day."  Also,  find  the  probability. 

Exercise  3.9.5  (Solution  on  p.  161.) 

In  words,  explain  what  it  means  to  pick  one  person  from  the  study  and  that  person  is  "Japanese 
American  GIVEN  that  person  smokes  21-30  cigarettes  per  day."  Also,  find  the  probability. 

Exercise  3.9.6 

Prove  that  smoking  level/ day  and  ethnicity  are  dependent  events. 
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pendent. 


NOTE:  Use  probability  rules  to  solve  the  problems  below.  Show  your  work. 
3.10.2  Given 

48%  of  all  Californians  registered  voters  prefer  life  in  prison  without  parole  over  the  death  penalty  for 
a  person  convicted  of  first  degree  murder.  Among  Latino  California  registered  voters,  55%  prefer  life 
in  prison  without  parole  over  the  death  penalty  for  a  person  convicted  of  first  degree  murder.  (Source; 
http://field.com/fieldpollonline/subscribers/Rls2393.pdf ). 
37.6%  of  all  Californians  are  Latino  {Source:  U.S.  Census  Bureau). 

In  this  problem,  let: 

•  C  —  Californians  {registered  voters)  preferring  life  in  prison  without  parole  over  the  death  penalty  for  a  person  con 

•  L  —  Latino  Californians 

Suppose  that  one  Califomian  is  randomly  selected. 


3.10.3  Analyze  the  Data 


Exercise  3.10.1 


(Solution  on  p.  161.) 


P(C)  = 


Exercise  3.10.2 


(Solution  on  p.  161.) 


P{L)  = 


Exercise  3.10.3 


(Solution  on  p.  161.) 


P{C\L)  = 


Exercise  3.10.4 

In  words,  what  is  "  C  |  L"? 


Exercise  3.10.5 

P  (L  AND  C)  = 

Exercise  3.10.6 


(Solution  on  p.  161.) 


In  words,  what  is  "L  and  C"? 
Exercise  3.10.7 

Are  L  and  C  independent  events?  Show  why  or  why  not. 
Exercise  3.10.8 


(Solution  on  p.  162.) 


(Solution  on  p.  162.) 


P  {L  OR  C)  = 


Exercise  3.10.9 

In  words,  what  is  "L  or  C"? 

Exercise  3.10.10 

Are  L  and  C  mutually  exclusive  events?  Show  why  or  why  not. 


(Solution  on  p.  162.) 


'This  content  is  available  online  at  <http://cnx.org/content/ml6840/1.12/>. 
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3.11  Homework" 

Exercise  3.11.1  (Solution  on  p.  162.) 

Suppose  that  you  have  8  cards.  5  are  green  and  3  are  yellow.  The  5  green  cards  are  numbered 
1,  2,  3,  4,  and  5.  The  3  yellow  cards  are  numbered  1,  2,  and  3.  The  cards  are  well  shuffled.  You 
randomly  draw  one  card. 

•  G  =  card  drawn  is  green 

•  E  =  card  drawn  is  even-nimibered 


a.  List  the  sample  space. 

b.  P(G)  = 

c.  P{G\E)  = 

d.  P  (G  AND  E)  = 

e.  P  (G  OR  E)  = 

f .  Are  G  and  E  mutually  exclusive?  Justify  yoiir  answer  numerically. 
Exercise  3.11.2 

Refer  to  the  previous  problem.  Suppose  that  this  time  you  randomly  draw  two  cards,  one  at  a 
time,  and  with  replacement. 

•  Gi  =  first  card  is  green 

•  G2  =  second  card  is  green 


a.  Draw  a  tree  diagram  of  the  situation. 

b.  P(Gi  ANDG2)  = 

c.  P  (at  least  one  green)  — 

d.  P(G2|Gi)  = 

e.  Are  G2  and  Gx  independent  events?  Explain  why  or  why  not. 

Exercise  3.11.3  (Solution  on  p.  162.) 

Refer  to  the  previous  problems.  Suppose  that  this  time  you  randomly  draw  two  cards,  one  at  a 
time,  and  without  replacement. 

•  Gi=  first  card  is  green 

•  G2=  second  card  is  green 


a.  Draw  a  tree  diagram  of  the  situation. 

b.  P(Gi  ANDG2)  = 

c.  P(at  least  one  green)  = 

d.  P(G2|Gi)  = 

e.  Are  G2  and  Gi  independent  events?  Explain  why  or  why  not. 
Exercise  3.11.4 

RoU  two  fair  dice.  Each  die  has  6  faces. 

a.  List  the  sample  space. 

b.  Let  A  be  the  event  that  either  a  3  or  4  is  rolled  first,  followed  by  an  even  number.  Find  P  (A). 

c.  Let  B  be  the  event  that  the  sum  of  the  two  rolls  is  at  most  7.  Find  P  (B). 

This  content  is  available  online  at  <http://cnx.org/content/ml6836/1.21/>. 
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d.  In  words,  explain  what  "P  {A\B)"  represents.  Find  P  {A\B). 

e.  Are  A  and  B  mutually  exclusive  events?  Explain  your  answer  in  1  -  3  complete  sentences, 

including  numerical  justification. 

f.  Are  A  and  B  independent  events?  Explain  your  answer  in  1  -  3  complete  sentences,  including 

numerical  justification. 

Exercise  3.11.5  (Solution  on  p.  162.) 

A  special  deck  of  cards  has  10  cards.  Four  are  green,  three  are  blue,  and  three  are  red.  When  a 
card  is  picked,  the  color  of  it  is  recorded.  An  experiment  consists  of  first  picking  a  card  and  then 
tossing  a  coin. 

a.  List  the  sample  space. 

b.  Let  A  be  the  event  that  a  blue  card  is  picked  first,  followed  by  landing  a  head  on  the  coin  toss. 

Find  P(A). 

c.  Let  B  be  the  event  that  a  red  or  green  is  picked,  followed  by  landing  a  head  on  the  coin  toss.  Are 

the  events  A  and  B  mutually  exclusive?  Explain  your  answer  in  1  -  3  complete  sentences, 
including  numerical  justification. 

d.  Let  C  be  the  event  that  a  red  or  blue  is  picked,  followed  by  landing  a  head  on  the  coin  toss.  Are 

the  events  A  and  C  mutually  exclusive?  Explain  your  answer  in  1  -  3  complete  sentences, 
including  nvimerical  justification. 

Exercise  3.11.6 

An  experiment  consists  of  first  rolling  a  die  and  then  tossing  a  coin: 

a.  List  the  sample  space. 

b.  Let  A  be  the  event  that  either  a  3  or  4  is  rolled  first,  followed  by  landing  a  head  on  the  coin  toss. 

Find  P(A). 

c.  Let  B  be  the  event  that  a  number  less  than  2  is  rolled,  followed  by  landing  a  head  on  the  coin 

toss.  Are  the  events  A  and  B  mutually  exclusive?  Explain  yoiur  answer  in  1  -  3  complete 
sentences,  including  nimierical  justification. 

Exercise  3.11.7  (Solution  on  p.  162.) 

An  experiment  consists  of  tossing  a  nickel,  a  dime  and  a  quarter.  Of  interest  is  the  side  the  coin 
lands  on. 

a.  List  the  sample  space. 

b.  Let  A  be  the  event  that  there  are  at  least  two  tails.  Find  P(A). 

c.  Let  B  be  the  event  that  the  first  and  second  tosses  land  on  heads.  Are  the  events  A  and  B 

mutually  exclusive?  Explain  your  answer  in  1  -  3  complete  sentences,  including  justification. 

Exercise  3.11.8 

Consider  the  following  scenario: 

•  LetP(C)  =  0.4 

•  LetP(D)  =  0.5 

•  LetP(CID)  =  0.6 

a.  Find  P(C  AND  D) . 

b.  Are  C  and  D  mutually  exclusive?  Why  or  why  not? 

c.  Are  C  and  D  independent  events?  Why  or  why  not? 

d.  FindP(CORD). 

e.  FindP(DIC). 
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Exercise  3.11.9 

E  and  F  mutually  exclusive  events.  P  (E)  =  0.4;  P  (F)  —  0.5 
Exercise  3.11.10 

/  and  K  are  independent  events.  PQ  I  K)  =  0.3.  Find  P  (/) 
Exercise  3.11.11 

U  and  V  are  mutually  exclusive  events.  P  (U)  =  0.26;  P  (V) 

a.  P(U  AND  V)  = 

b.  P(U  I  V)  = 

c.  P(UORV)  = 

Exercise  3.11.12 

Q  and  R  are  independent  events.  P  (Q)  =  OA;P{QANDR)  =  0.1 .  Find  P  (R). 

Exercise  3.11.13  (Solution  on  p.  162.) 

Y  and  Z  are  independent  events. 

a.  Rewrite  the  basic  Addition  Rule  P(Y  OR  Z)     =  P  (Y)  +  P  (Z)  -  P  (Y  AND  Z)  using  the 

information  that  Y  and  Z  are  independent  events. 

b.  Use  the  rewritten  rule  to  find  P  (Z)  if  P  (Y  OR  Z)  =  0.71  and  P  (Y)  =  0.42  . 

Exercise  3.11.14 

G  and  H  are  mutually  exclusive  events.  P  (G)  —  0.5;  P  (H)  =  0.3 

a.  Explain  why  the  following  statement  MUST  be  false:  P  (H  |  G)  =  0.4  . 

b.  Find:  P(HORG). 

c.  Are  G  and  H  independent  or  dependent  events?  Explain  in  a  complete  sentence. 

Exercise  3.11.15  (Solution  on  p.  162.) 

The  following  are  real  data  from  Santa  Clara  County,  CA.  As  of  a  certain  time,  there  had  been 
a  total  of  3059  documented  cases  of  AIDS  in  the  county.  They  were  grouped  into  the  following 
categories  {Source:  Santa  Clara  County  Public  H.D.): 


Homosexual/Bisexual 

IV  Drug  User* 

Heterosexual  Contact 

Other 

Totals 

Female 

0 

70 

136 

49 

Male 

2146 

463 

60 

135 

Totals 

Table  3.5:  *  includes  homosexual/bisexual  IV  drug  users 


Suppose  one  of  the  persons  with  AIDS  in  Santa  Clara  County  is  randomly  selected.  Compute  the 
following: 

a.  P(person  is  female)  = 

b.  P(person  has  a  risk  factor  Heterosexual  Contact)  = 

c.  P(person  is  female  OR  has  a  risk  factor  of  IV  Drug  User)  = 

d.  P(person  is  female  AND  has  a  risk  factor  of  Homosexual /Bisexual)  = 

e.  P(person  is  male  AND  has  a  risk  factor  of  IV  Drug  User)  = 

f .  P(female  GIVEN  person  got  the  disease  from  heterosexual  contact)  = 

g.  Construct  a  Venn  Diagram.  Make  one  group  females  and  the  other  group  heterosexual  contact. 


(Solution  on  p.  162.) 

Find  P(E  |  F). 


(Solution  on  p.  162.) 

=  0.37.  Find: 
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Exercise  3.11.16 

Solve  these  questions  using  probability  rules.  Do  NOT  use  the  contingency  table  above.  3059 
cases  of  AIDS  had  been  reported  in  Santa  Clara  County,  CA,  through  a  certain  date.  Those  cases 
will  be  our  population.  Of  those  cases,  6.4%  obtained  the  disease  through  heterosexual  contact 
and  7.4%  are  female.  Out  of  the  females  with  the  disease,  53.3%  got  the  disease  from  heterosexual 
contact. 

a.  P(person  is  female)  = 

b.  P(person  obtained  the  disease  through  heterosexual  contact)  = 

c.  P(female  GIVEN  person  got  the  disease  from  heterosexual  contact)  = 

d.  Construct  a  Venn  Diagram.  Make  one  group  females  and  the  other  group  heterosexual  contact. 

Fill  in  all  values  as  probabilities. 

Exercise  3.11.17  (Solution  on  p.  163.) 

The  following  table  identifies  a  group  of  children  by  one  of  four  hair  colors,  and  by  type  of  hair. 


Hair  Type 

Brown 

Blond 

Black 

Red 

Totals 

Wavy 

20 

15 

3 

43 

Straight 

80 

15 

12 

Totals 

20 

215 

Table  3.6 


a.  Complete  the  table  above. 

b.  What  is  the  probability  that  a  randomly  selected  child  will  have  wavy  hair? 

c.  What  is  the  probability  that  a  randomly  selected  child  will  have  either  brown  or  blond  hair? 

d.  What  is  the  probability  that  a  randomly  selected  child  will  have  wavy  brown  hair? 

e.  What  is  the  probability  that  a  randomly  selected  child  will  have  red  hair,  given  that  he  has 

straight  hair? 

f.  If  B  is  the  event  of  a  child  having  brown  hair,  find  the  probability  of  the  complement  of  B. 

g.  In  words,  what  does  the  complement  of  B  represent? 

Exercise  3.11.18 

A  previous  year,  the  weights  of  the  members  of  the  San  Francisco  49ers  and  the  Dallas  Cowboys 
were  published  in  the  San  Jose  Mercury  News.  The  factual  data  are  compiled  into  the  following 
table. 


Shirt* 

<  210 

211-250 

251-290 

290< 

1-33 

21 

5 

0 

0 

34-66 

6 

18 

7 

4 

66-99 

6 

12 

22 

5 

Table  3.7 


For  the  following,  suppose  that  you  randomly  select  one  player  from  the  49ers  or  Cowboys. 

a.  Find  the  probability  that  his  shirt  number  is  from  1  to  33. 

b.  Find  the  probability  that  he  weighs  at  most  210  pounds. 
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c.  Find  the  probability  that  his  shirt  number  is  from  1  to  33  AND  he  weighs  at  most  210  pounds. 

d.  Find  the  probability  that  his  shirt  number  is  from  1  to  33  OR  he  weighs  at  most  210  poimds. 

e.  Find  the  probability  that  his  shirt  number  is  from  1  to  33  GIVEN  that  he  weighs  at  most  210 

pounds. 

f.  If  having  a  shirt  number  from  1  to  33  and  weighing  at  most  210  poimds  were  independent 

events,  then  what  should  be  true  about  P(Shirt#  1-33  I  <  210  pounds)? 

Exercise  3.11.19  (Solution  on  p.  163.) 

Approximately  281,000,000  people  over  age  5  live  in  the  United  States.  Of 
these  people,  55,000,000  speak  a  language  other  than  English  at  home.  Of 
those  who  speak  another  language  at  home,  62.3%  speak  Spanish.  (Source; 
http://www.census.gov/hhes/socdemo/language/data/acs/ACS-12.pdf) 

Let:  E  =  speak  English  at  home;  E'  =  speak  another  language  at  home;  S  =  speak  Spanish; 

Finish  each  probability  statement  by  matching  the  correct  answer. 


Probability  Statements 

Answers 

a.  P(E')  = 

i.  0.8043 

b.  P(E)  = 

ii.  0.623 

c.  P(S  and  E')  = 

iii.  0.1957 

d.  P(S  1 E')  = 

iv  0.1219 

Table  3.8 


Exercise  3.11.20 

The  probability  that  a  male  develops  some  form  of  cancer  in  his  lifetime  is  0.4567  (Source:  Ameri- 
can Cancer  Society).  The  probability  that  a  male  has  at  least  one  false  positive  test  result  (meaning 
the  test  comes  back  for  cancer  when  the  man  does  not  have  it)  is  0.51  (Soiirce:  USA  Today).  Some  of 
the  questions  below  do  not  have  enough  information  for  you  to  answer  them.  Write  "not  enough 
information"  for  those  answers. 

Let:  C  =  a  man  develops  cancer  in  his  lifetime;  P  =  man  has  at  least  one  false  positive 

a.  Construct  a  tree  diagram  of  the  situation. 

b.  P(C)  = 

c.  P(P|C)  = 

d.  P(P|C' )  = 

e.  If  a  test  comes  up  positive,  based  upon  numerical  values,  can  you  assume  that  man  has  cancer? 

Justify  numerically  and  explain  why  or  why  not. 

Exercise  3.11.21  (Solution  on  p.  163.) 

In  1994,  the  U.S.  government  held  a  lottery  to  issue  55,000  Green  Cards  (permits  for  non-citizens 
to  work  legally  in  the  U.S.).  Renate  Deutsch,  from  Germany,  was  one  of  approximately  6.5  million 
people  who  entered  this  lottery.  Let  G  =  won  Green  Card. 

a.  What  was  Renate's  chance  of  winning  a  Green  Card?  Write  yoiir  answer  as  a  probability  state- 

ment. 

b.  In  the  summer  of  1994,  Renate  received  a  letter  stating  she  was  one  of  110,000  finalists  chosen. 

Once  the  finalists  were  chosen,  assuming  that  each  finalist  had  an  equal  chance  to  win,  what 
was  Renate's  chance  of  winning  a  Green  Card?  Let  F  =  was  a  finalist.  Write  yoiir  answer  as 
a  conditional  probability  statement. 
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c.  Are  G  and  F  independent  or  dependent  events?  Justify  your  answer  numerically  and  also 

explain  why. 

d.  Are  G  and  F  mutually  exclusive  events?  Justify  your  answer  numerically  and  also  explain  why. 

NOTE:  P.S.  Amazingly,  on  2/1/95,  Renate  learned  that  she  woiild  receive  her  Green  Card  -  true 
story! 

Exercise  3.11.22 

Three  professors  at  George  Washington  University  did  an  experiment  to  determine  if  economists 
are  more  selfish  than  other  people.  They  dropped  64  stamped,  addressed  envelopes  with  $10  cash 
in  different  classrooms  on  the  George  Washington  campus.  44%  were  returned  overall.  From  the 
economics  classes  56%  of  the  envelopes  were  returned.  From  the  business,  psychology,  and  history 
classes  31%  were  returned.  {Source:  Wall  Street  Journal) 

Let:  R  =  money  returned;  E  =  economics  classes;  O  =  other  classes 

a.  Write  a  probability  statement  for  the  overall  percent  of  money  returned. 

b.  Write  a  probability  statement  for  the  percent  of  money  returned  out  of  the  economics  classes. 

c.  Write  a  probability  statement  for  the  percent  of  money  returned  out  of  the  other  classes. 

d.  Is  money  being  returned  independent  of  the  class?  Justify  your  answer  numerically  and  explain 

it. 

e.  Based  upon  this  study,  do  you  think  that  economists  are  more  selfish  than  other  people?  Explain 

why  or  why  not.  Include  numbers  to  justify  your  answer. 

Exercise  3.11.23  (Solution  on  p.  163.) 

The  chart  below  gives  the  number  of  suicides  estimated  in  the  U.S.  for  a  recent  year  by  age,  race 
(black  and  white),  and  sex.  We  are  interested  in  possible  relationships  between  age,  race,  and  sex. 
We  will  let  suicide  victims  be  our  population.  {Source:  The  National  Center  for  Health  Statistics, 
U.S.  Dept  of  Health  and  Human  Services) 


Race  and  Sex 

1-14 

15-24 

25-64 

over  64 

TOTALS 

white,  male 

210 

3360 

13,610 

22,050 

white,  female 

80 

580 

3380 

4930 

black,  male 

10 

460 

1060 

1670 

black,  female 

0 

40 

270 

330 

all  others 

TOTALS 

310 

4650 

18,780 

29,760 

Table  3.9 


NOTE:  Do  not  include  "all  others"  for  parts  (f),  (g),  and  (i). 

a.  Fill  in  the  column  for  the  suicides  for  individuals  over  age  64. 

b.  Fill  in  the  row  for  all  other  races. 

c.  Find  the  probability  that  a  randomly  selected  individual  was  a  white  male. 

d.  Find  the  probability  that  a  randomly  selected  individual  was  a  black  female. 

e.  Find  the  probability  that  a  randomly  selected  individual  was  black 
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f .  Find  the  probability  that  a  randomly  selected  individual  was  male. 

g.  Out  of  the  individuals  over  age  64,  find  the  probability  that  a  randomly  selected  individual  was 

a  black  or  white  male. 

h.  Comparing  "Race  and  Sex"  to  "Age,"  which  two  groups  are  mutually  exclusive?  How  do  you 

know? 

i.  Are  being  male  and  committing  suicide  over  age  64  independent  events?  How  do  you  know? 

The  next  two  questions  refer  to  the  following:  The  percent  of  licensed  U.S.  drivers  (from  a  recent  year) 
that  are  female  is  48.60.  Of  the  females,  5.03%  are  age  19  and  under;  81.36%  are  age  20  -  64;  13.61%  are  age 
65  or  over.  Of  the  licensed  U.S.  male  drivers,  5.04%  are  age  19  and  under;  81.43%  are  age  20  -  64;  13.53%  are 
age  65  or  over.  (Source:  Federal  Highway  Administration,  U.S.  Dept.  of  Transportation) 

Exercise  3.11.24 

Complete  the  following: 

a.  Construct  a  table  or  a  tree  diagram  of  the  situation. 

b.  P(driver  is  female)  = 

c.  P(driver  is  age  65  or  over  I  driver  is  female)  = 

d.  P(driver  is  age  65  or  over  AND  female)  = 

e.  In  words,  explain  the  difference  between  the  probabilities  in  part  (c)  and  part  (d). 

f.  P(driver  is  age  65  or  over)  = 

g.  Are  being  age  65  or  over  and  being  female  mutually  exclusive  events?  How  do  you  know 

Exercise  3.11.25  (Solution  on  p.  163.) 

Suppose  that  10,000  U.S.  licensed  drivers  are  randomly  selected. 

a.  How  many  would  you  expect  to  be  male? 

b.  Using  the  table  or  tree  diagram  from  the  previous  exercise,  construct  a  contingency  table  of 

gender  versus  age  group. 

c.  Using  the  contingency  table,  find  the  probability  that  out  of  the  age  20  -  64  group,  a  randoirily 

selected  driver  is  female. 

Exercise  3.11.26 

Approximately  86.5%  of  Americans  commute  to  work  by  car,  truck  or  van.  Out  of  that  group, 
84.6%  drive  alone  and  15.4%  drive  in  a  carpool.  Approximately  3.9%  walk  to  work  and  approxi- 
mately 5.3%  take  public  transportation.  {Source:  Bureau  of  the  Census,  U.S.  Dept.  of  Commerce. 
Disregard  rounding  approximations.) 

a.  Construct  a  table  or  a  tree  diagram  of  the  situation.  Include  a  branch  for  all  other  modes  of 

transportation  to  work. 

b.  Assuming  that  the  walkers  walk  alone,  what  percent  of  all  commuters  travel  alone  to  work? 

c.  Suppose  that  1000  workers  are  randomly  selected.  How  many  would  you  expect  to  travel  alone 

to  work? 

d.  Suppose  that  1000  workers  are  randomly  selected.  How  many  would  you  expect  to  drive  in  a 

carpool? 

Exercise  3.11.27 

Explain  what  is  wrong  with  the  following  statements.  Use  complete  sentences. 

a.  If  there's  a  60%  chance  of  rain  on  Saturday  and  a  70%  chance  of  rain  on  Sunday,  then  there's  a 

130%  chance  of  rain  over  the  weekend. 

b.  The  probability  that  a  baseball  player  hits  a  home  run  is  greater  than  the  probability  that  he 

gets  a  successful  hit. 
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3.11.1  Try  these  multiple  choice  questions. 

The  next  two  questions  refer  to  the  following  probability  tree  diagram  which  shows  tossing  an  unfair 
coin  FOLLOWED  BY  drawing  one  bead  from  a  cup  containing  3  red  (R),  4  yellow  (Y)  and  5  blue  (B)  beads. 
For  the  coin,  P  (H)  =  j  and  P  (T)  =  i  where  H  =  "heads"  and  T  =  "tails". 


Figure  3.3 


Exercise  3.11.28  (Solution  on  p.  164.) 

Find  P(tossing  a  Head  on  the  coin  AND  a  Red  bead) 


D. 


2 
3 
5_ 
15 
_6_ 
36 
5_ 
36 


Exercise  3.11.29  (Solution  on  p.  164.) 

Find  P(Blue  bead). 


C. 
D. 


15 
36 
10 
36 
10 
12 

36 


The  next  three  questions  refer  to  the  following  table  of  data  obtained  from  www.baseball-almanac.com^^ 
showing  hit  information  for  4  well  known  baseball  players.  Suppose  that  one  hit  from  the  table  is  randomly 
selected. 


^http:/ / cnx.org/ content/ ml6836/latest/www.baseball-almanac.com 
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NAME 

Single 

Double 

Triple 

Home  Run 

TOTAL  HITS 

Babe  Ruth 

1517 

506 

136 

714 

2873 

Jackie  Robinson 

1054 

273 

54 

137 

1518 

Ty  Cobb 

3603 

174 

295 

114 

4189 

Hank  Aaron 

2294 

624 

98 

755 

3771 

TOTAL 

8471 

1577 

583 

1720 

12351 

Table  3.10 


(Solution  on  p.  164.) 


(Solution  on  p.  164.) 


Exercise  3.11.30 

Find  P(hit  was  made  by  Babe  Ruth). 

A  1518 
^-  2873 
R  2873 

12351 
r  583 
^-  12351 
r»  4189 

12351 

Exercise  3.11.31 

Find  P(hit  was  made  by  Ty  Cobb  I  The  hit  was  a  Home  Run) 

A  4189 

^-  12351 

R  114l 

D.  -^720 

r  1720 
V-  4189 

^-  12351 

Exercise  3.11.32  (Solution  on  p.  164.) 

Are  the  hit  being  made  by  Hank  Aaron  and  the  hit  being  a  double  independent  events? 

A.  Yes,  because  P(hit  by  Hank  Aaron  I  hit  is  a  double)  -  P(hit  by  Hank  Aaron) 

B.  No,  because  P(hit  by  Hank  Aaron  I  hit  is  a  double)  ^  P(hit  is  a  double) 

C.  No,  because  P(hit  is  by  Hank  Aaron  I  hit  is  a  double)  7^  P(hit  by  Hank  Aaron) 

D.  Yes,  because  P(hit  is  by  Hank  Aaron  I  hit  is  a  double)  =  P(hit  is  a  double) 


Exercise  3.11.33 

Given  events  G  and  H:  P(G)  =  0.43  ;  P(H)  =  0.26 ;  P(H  and  G)  =  0.14 

A.  FindP(HorG) 

B.  Find  the  probability  of  the  complement  of  event  (H  and  G) 

C.  Find  the  probability  of  the  complement  of  event  (H  or  G) 

Exercise  3.11.34 

Given  events  J  and  K:  PQ)  =  0.18 ;  P(K)  =  0.37 ;  PQ  or  K)  =  0.45 

A.  Find  P(J  and  K) 

B.  Find  the  probability  of  the  complement  of  event  (J  and  K) 

C.  Find  the  probability  of  the  complement  of  event  (J  or  K) 


(Solution  on  p.  164.) 


(Solution  on  p.  164.) 
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Exercise  3.11.35  (Solution  on  p.  164.) 

United  Blood  Services  is  a  blood  bank  that  serves  more  than  500  hospitals  in  18  states.  Accord- 
ing to  their  website,  http:  /  /  www.unitedbloodservices.org/humanbloodt5rpes.html,  a  person  with 
type  O  blood  and  a  negative  Rh  factor  (Rh— )  can  donate  blood  to  any  person  with  any  bloodtype. 
Their  data  show  that  43%  of  people  have  type  O  blood  and  15%  of  people  have  Rh—  factor;  52% 
of  people  have  type  O  or  Rh—  factor. 

A.  Find  the  probability  that  a  person  has  both  tj^e  O  blood  and  the  Rh—  factor 

B.  Find  the  probability  that  a  person  does  NOT  have  both  type  O  blood  and  the  Rh—  factor. 

Exercise  3.11.36  (Solution  on  p.  164.) 

At  a  college,  72%  of  courses  have  final  exams  and  46%  of  courses  require  research  papers.  Suppose 
that  32%  of  courses  have  a  research  paper  and  a  final  exam.  Let  F  be  the  event  that  a  course  has  a 
final  exam.  Let  R  be  the  event  that  a  course  requires  a  research  paper. 

A.  Find  the  probability  that  a  course  has  a  final  exam  or  a  research  project. 

B.  Find  the  probability  that  a  course  has  NEITHER  of  these  two  requirements. 

Exercise  3.11.37  (Solution  on  p.  164.) 

In  a  box  of  assorted  cookies,  36%  contain  chocolate  and  12%  contain  nuts.  Of  those,  8%  contain 
both  chocolate  and  nuts.  Sean  is  allergic  to  both  chocolate  and  nuts. 

A.  Find  the  probability  that  a  cookie  contains  chocolate  or  nuts  (he  can't  eat  it). 

B.  Find  the  probability  that  a  cookie  does  not  contain  chocolate  or  nuts  (he  can  eat  it). 

Exercise  3.11.38  (Solution  on  p.  164.) 

A  college  finds  that  10%  of  students  have  taken  a  distance  learning  class  and  that  40%  of  students 
are  part  time  students.  Of  the  part  time  students,  20%  have  taken  a  distance  learning  class.  Let  D 
=  event  that  a  student  takes  a  distance  learning  class  and  E  =  event  that  a  student  is  a  part  time 
student 

A.  Find  P(D  and  E) 

B.  FindP(E  I  D) 

C.  FindP(DorE) 

D.  Using  an  appropriate  test,  show  whether  D  and  E  are  independent. 

E.  Using  an  appropriate  test,  show  whether  D  and  E  are  mutually  exclusive. 

Exercise  3.11.39  (Solution  on  p.  164.) 

When  the  Euro  coin  was  introduced  in  2002,  two  math  professors  had  their  statistics  students  test 
whether  the  Belgian  1  Euro  coin  was  a  fair  coin.  They  spun  the  coin  rather  than  tossing  it,  and  it 
was  found  that  out  of  250  spins,  140  showed  a  head  (event  H)  while  110  showed  a  tail  (event  T). 
Therefore,  they  claim  that  this  is  not  a  fair  coin. 

A.  Based  on  the  data  above,  find  P(H)  and  P(T). 

B.  Use  a  tree  to  find  the  probabilities  of  each  possible  outcome  for  the  experiment  of  tossing  the 

coin  twice. 

C.  Use  the  tree  to  find  the  probability  of  obtaining  exactly  one  head  in  two  tosses  of  the  coin. 

D.  Use  the  tree  to  find  the  probability  of  obtaining  at  least  one  head. 

Exercise  3.11.40 

A  box  of  cookies  contains  3  chocolate  and  7  butter  cookies.  Miguel  randomly  selects  a  cookie  and 
eats  it.  Then  he  randomly  selects  another  cookie  and  eats  it  also.  (How  many  cookies  did  he  take?) 
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A.  Draw  the  tree  that  represents  the  possibilities  for  the  cookie  selections.  Write  the  probabilities 

along  each  branch  of  the  tree. 

B.  Are  the  probabilities  for  the  flavor  of  the  SECOND  cookie  that  Miguel  selects  independent  of 

his  first  selection?  Explain. 

C.  For  each  complete  path  through  the  tree,  write  the  event  it  represents  and  find  the  probabilities. 

D.  Let  S  be  the  event  that  both  cookies  selected  were  the  same  flavor.  Find  P(S). 

E.  Let  T  be  the  event  that  both  cookies  selected  were  different  flavors.  Find  P(T)  by  two  different 

methods:  by  using  the  complement  rule  and  by  using  the  branches  of  the  tree.  Your  answers 
should  be  the  same  with  both  methods. 
E  Let  U  be  the  event  that  the  second  cookie  selected  is  a  butter  cookie.  Find  P(U). 

Exercises  33  -  40  contributed  by  Roberta  Bloom 
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3.12  Review 


The  first  six  exercises  refer  to  the  following  study:  In  a  survey  of  100  stocks  on  NASDAQ,  the  average 
percent  increase  for  the  past  year  was  9%  for  NASDAQ  stocks.  Answer  the  following: 

Exercise  3.12.1  (Solution  on  p.  165.) 

The  "average  increase"  for  all  NASDAQ  stocks  is  the: 

A.  Population 

B.  Statistic 

C.  Parameter 

D.  Sample 

E.  Variable 


Exercise  3.12.2  (Solution  on  p.  165.) 

AU  of  the  NASDAQ  stocks  axe  the: 

A.  Population 

B.  Statistic 

C.  Parameter 

D.  Sample 

E.  Variable 


Exercise  3.12.3  (Solution  on  p.  165.) 

9%  is  the: 

A.  Population 

B.  Statistic 

C.  Parameter 

D.  Sample 

E.  Variable 


Exercise  3.12.4  (Solution  on  p.  165.) 

The  100  NASDAQ  stocks  in  the  survey  are  the: 

A.  Population 

B.  Statistic 

C.  Parameter 

D.  Sample 

E.  Variable 


Exercise  3.12.5  (Solution  on  p.  165.) 

The  percent  increase  for  one  stock  in  the  survey  is  the: 

A.  Population 

B.  Statistic 

C.  Parameter 

D.  Sample 

E.  Variable 

^^This  content  is  available  online  at  <http://cnx.org/content/ml6842/1.9/>. 
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Exercise  3.12.6  (Solution  on  p.  165.) 

Would  the  data  collected  be  qualitative,  quantitative  -  discrete,  or  quantitative  -  continuous? 

The  next  two  questions  refer  to  the  following  study:  Thirty  people  spent  two  weeks  around  Mardi  Gras 
in  New  Orleans.  Their  two-week  weight  gain  is  below.  (Note:  a  loss  is  shown  by  a  negative  weight  gain.) 


Weight  Gain 

Frequency 

-2 

3 

-1 

5 

0 

2 

1 

4 

4 

13 

6 

2 

11 

1 

Table  3.11 

Exercise  3.12.7  (Solution  on  p.  165.) 

Calculate  the  following  values: 

a.  The  average  weight  gain  for  the  two  weeks 

b.  The  standard  deviation 

c.  The  first,  second,  and  third  quartiles 

Exercise  3.12.8 

Construct  a  histogram  and  a  boxplot  of  the  data. 
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3.13  Lab:  Probability  Topics" 

Class  time: 
Names: 

3.13.1  Student  Learning  Outcomes: 

•  The  student  will  use  theoretical  and  empirical  methods  to  estimate  probabilities. 

•  The  student  will  appraise  the  differences  between  the  two  estimates. 

•  The  student  wiU  demonstrate  an  understanding  of  long-term  relative  frequencies. 


3.13.2  Do  the  Experiment: 

Count  out  40  mixed-color  M&M's®  which  is  approximately  1  small  bag's  worth  (distance  learning  classes 
using  the  virtual  lab  would  want  to  count  out  25  M&M's®).  Record  the  number  of  each  color  in  the  "Pop- 
ulation" table.  Use  the  information  from  this  table  to  complete  the  theoretical  probability  questions.  Next, 
put  the  M&M's  in  a  cup.  The  experiment  is  to  pick  2  M&M's,  one  at  a  time.  Do  not  look  at  them  as  you 
pick  them.  The  first  time  through,  replace  the  first  M&M  before  picking  the  second  one.  Record  the  results 
in  the  "With  Replacement"  column  of  the  empirical  table.  Do  this  24  times.  The  second  time  through,  after 
picking  the  first  M&M,  do  not  replace  it  before  picking  the  second  one.  Then,  pick  the  second  one.  Record 
the  results  in  the  "Without  Replacement"  column  section  of  the  "Empirical  Results"  table.  After  you  record 
the  pick,  put  both  M&M's  back.  Do  this  a  total  of  24  times,  also.  Use  the  data  from  the  "Empirical  Results" 
table  to  calculate  the  empirical  probability  questions.  Leave  your  answers  in  unreduced  fractional  form. 
Do  not  multiply  out  any  fractions. 

Population 


Color 

Quantity 

Yellow  (Y) 

Green  (G) 

Blue  (BL) 

Brown  (B) 

Orange  (O) 

Red  (R) 

Table  3.12 


^*This  content  is  available  online  at  <http://caTx.org/content/ml6841/1.15/>. 
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Theoretical  Probabilities 


With  Replacement 

Without  Replacement 

P  (2  reds) 

PiRiBzORBiRz) 

P  (Ki  AND  G2) 

P(G2  IKi) 

P  (no  yellows) 

P  (doubles) 

P  (no  doubles) 

Table  3.13:  Note:  G2  =  green  on  second  pick;  Ri  =  red  on  first  pick;  Bj  =  brown  on  first  pick;  B2  =  brown  on 

second  pick;  doubles  =  both  picks  are  the  same  colour. 

Empirical  Results 


With  Replacement 

Without  Replacement 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  /          /  V           /   / 

V         '         /  V         '  / 

\  —  /  —  /  \  —  /  —  / 

V       '       /  \       '  ) 

\  —  /  —  /  \  —  /  —  / 

V          '          /  \          '  / 

\  /  /  V  /  / 

\  /  /  \  /  / 

\       '       /  \       '  / 

\       '       /  V       '  / 

V       '       /  v       '  / 

V         '         /  \         '  / 

\       '       )  \       '  ) 

V  —  /  —  /  V  —  /  —  / 

V  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

\  —  /  —  /  \  —  /  —  / 

Table  3.14 
Empirical  Probabilities 


With  Replacement 

Without  Replacement 

P  (2  reds) 

P(RiB2  0RBiR2) 

P  (Ri  AND  G2) 

P(G2  1  Ri) 

P  (no  yellows) 

P  (doubles) 

P  (no  doubles) 
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Table  3.15:  Note: 


3.13.3  Discussion  Questions 

1.  Why  are  the  "With  Replacement"  and  "Without  Replacement"  probabilities  different? 

2.  Convert  P(no  yellows)  to  decimal  format  for  both  Theoretical  "With  Replacement"  and  for  Empirical 
"With  Replacement".  Roimd  to  4  decimal  places. 

a.  Theoretical  "With  Replacement":  P(no  yellows)  — 
h.  Empirical  "With  Replacement":  P(no  yellows)  — 

c.  Are  the  decimal  values  "close"?  Did  you  expect  them  to  be  closer  together  or  farther  apart?  Why? 

3.  If  you  increased  the  number  of  times  you  picked  2  M&M's  to  240  times,  why  would  empirical  proba- 
bility values  change? 

4.  Would  this  change  (see  (3)  above)  cause  the  empirical  probabilities  and  theoretical  probabilities  to  be 
closer  together  or  farther  apart?  How  do  you  know? 

5.  Explain  the  differences  in  what  P  (Gi  AND  R2)  and  P  {Ri  I  G2)  represent.  Hint:  Think  about  the 
sample  space  for  each  probability. 
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Solutions  to  Exercises  in  Chapter  3 

Solution  to  Exercise  3.2.1  (p.  124) 

a.  P(L')=P(S) 

b.  P(MorS) 

c.  P(F  and  L) 

d.  P(MIL) 

e.  P(L  I M) 

f .  P(S  I  F) 

g.  P(FIL) 

h.  P(ForL) 

i.  P(M  and  S) 
j.  P(F) 

Solution  to  Example  3.2,  Problem  (p.  126) 

No.  C  =  {3,  5}  and  E  =  {1,  2,  3,  4}.  P  (C  AND  E)  =  i.  To  be  mutually  exclusive,  P  (C  AND  E)  must  be 
0. 

Solution  to  Example  3.11,  Problem  1  (p.  132) 


Hiking  Area  Preference 


Sex 

The  Coastline 

Near  Lakes  and  Streams 

On  Mountain  Peaks 

Total 

Female 

18 

16 

11 

45 

Male 

16 

25 

14 

55 

Total 

34 

41 

25 

100 

Table  3.16 


Solution  to  Example  3.11,  Problem  2  (p.  132) 

a.  P(F  AND  C)  =  ^  =  0.18 

b.  P(F)  •  P(C)  =  ^  ■  ^  =  0.45  •  0.34  =  0.153 

P(FANDC)  /  P  (F)  •  P  (C),  so  the  events  F  and  C  are  not  independent. 
Solution  to  Example  3.11,  Problem  3  (p.  133) 

a.  The  word  'given'  tells  you  that  this  is  a  conditional. 

b.  P(MIL)  =  If 

c.  No,  the  sample  space  for  this  problem  is  41. 
Solution  to  Example  3.11,  Problem  4  (p.  133) 

a.  m  =  ^ 

b-  P(P)  =  1 

c.  P(FANDP)  =  ^ 

d  PCF  DR  P~l  —  15__|__25__1]___59_ 
a.  1  \r        1  ;  —   loo  ^  100        100  ~  100 

Solution  to  Example  3.12,  Problem  1  (p.  133) 
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Door  Choice 


Caught  or  Not 

Door  One 

Door  Two 

Door  Three 

Total 

Caught 

1 

15 

1 

12 

1 

6 

19 
60 

Not  Caught 

4 
15 

3 
12 

1 
6 

41 
60 

Total 

5 
15 

4 
12 

2 
6 

1 

Table  3.17 


Solution  to  Example  3.16,  Problem  1  (p.  136) 

BlRl;  B1R2;  B1R3;  B2R1;  B2R2;  B2R3;  B3R1;  B3R2;  B3R3;  B4R1;  B4R2;  B4R3;  B5R1;  B5R2;  B5R3;  B6R1; 
B6R2;  B6R3;  B7R1;  B7R2;  B7R3;  B8R1;  B8R2;  B8R3 
Solution  to  Example  3.16,  Problem  6  (p.  137) 

P(BB)  =  ^ 

Solution  to  Example  3.16,  Problem  7  (p.  137) 

P(B  on  2nd  draw  I  R  on  1st  draw)  =  ^ 

There  are  9  +  24  outcomes  that  have  R  on  the  first  draw  (9  RR  and  24  RB).  The  sample  space  is  then 
9  +  24  =  33.  Twenty-four  of  the  33  outcomes  have  B  on  the  second  draw.  The  probability  is  then  jj. 

Solution  to  Example  3.17,  Problem  2  (p.  138) 

P(RBorBR)=A  .  8  +         (  3_)  =^ 

Solution  to  Example  3.17,  Problem  3  (p.  138) 

P(R  on  2d  I  B  on  1st)  =  ^ 

Solution  to  Example  3.17,  Problem  4  (p.  138) 

P(R  on  1st  and  B  on  2nd)  =  P(RB)  =    (^)  (^)    =  ^ 

Solution  to  Example  3.17,  Problem  5  (p.  138) 

P(BB)  ^ 


Solutions  to  Practice  1:  Contingency  Tables 

Solution  to  Exercise  3.9.1  (p.  141) 

35,065 
100,450 

Solution  to  Exercise  3.9.2  (p.  141) 

19,969 
100,450 

Solution  to  Exercise  3.9.3  (p.  142) 

4,715 
100,450 

Solution  to  Exercise  3.9.4  (p.  142) 

36,636 
100,450 

Solution  to  Exercise  3.9.5  (p.  142) 

4715 
15,273 


Solutions  to  Practice  2:  Calculating  Probabilities 

Solution  to  Exercise  3.10.1  (p.  143) 

0.48 

Solution  to  Exercise  3.10.2  (p.  143) 
0.376 

Solution  to  Exercise  3.10.3  (p.  143) 

0.55 
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Solution  to  Exercise  3.10.5  (p.  143) 
0.2068 

Solution  to  Exercise  3.10.7  (p.  143) 

No 

Solution  to  Exercise  3.10.8  (p.  143) 

0.6492 

Solution  to  Exercise  3.10.10  (p.  143) 
No 

Solutions  to  Homework 
Solution  to  Exercise  3.11.1  (p.  144) 

a.  (Gl,  G2,  G3,  G4,  G5,  Yl,  Y2,  Y3} 


Solution  to  Exercise  3.11.3  (p.  144) 

b.  (i)  (I) 

d.  I 

e.  No 

Solution  to  Exercise  3.11.5  (p.  145) 

a.  {GH,GT,BH,BT,RH,RT} 

D.  20 

c.  Yes 

d.  No 


Solution  to  Exercise  3.11.7  (p.  145) 


c.  Yes 


Solution  to  Exercise  3.11.9  (p.  146) 

0 

Solution  to  Exercise  3.11.11  (p.  146) 

a.  0 

b.  0 

c.  0.63 

Solution  to  Exercise  3.11.13  (p.  146) 
b.  0.5 

Solution  to  Exercise  3.11.15  (p.  146) 

The  completed  contingency  table  is  as  follows: 
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Homosexual/Bisexual 

IV  Drug  User* 

Heterosexual  Contact 

Other 

Totals 

Female 

0 

70 

136 

49 

255 

Male 

2146 

463 

60 

135 

2804 

Totals 

2146 

533 

196 

184 

3059 

Table  3.18:  *  includes  homosexual/bisexual  IV  drug  users 


a  255_ 
3059 

b  ^ 

3059 

^-  3059 

d.  0 

p  463_ 

3059 
f  136 

196 

Solution  to  Exercise  3.11.17  (p.  147) 
b  ^ 

215 
c  120 
^-  215 

d  ^ 

215 

e  ^ 

f  115 
215 

Solution  to  Exercise  3.11.19  (p.  148) 

a.  iii 

b.  i 

c.  iv 

d.  ii 

Solution  to  Exercise  3.11.21  (p.  148) 

a.  P  (G)  =  0.008 

b.  0.5 

c.  dependent 

d.  No 

Solution  to  Exercise  3.11.23  (p.  149) 

^  22050 


d. 


29760 
330 
29760 
2000 
29760 
23720 


'■•  29760 
„  5010 
8-  6020 

h.  Black  females  and  ages  1-14 

i.  No 

Solution  to  Exercise  3.11.25  (p.  150) 

a.  5140 
c.  0.49 
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Solution  to  Exercise  3.11.28  (p.  151) 

C 

Solution  to  Exercise  3.11.29  (p.  151) 

A 

Solution  to  Exercise  3.11.30  (p.  152) 

B 

Solution  to  Exercise  3.11.31  (p.  152) 
B 

Solution  to  Exercise  3.11.32  (p.  152) 

C 

Solution  to  Exercise  3.11.33  (p.  152) 

A.  P(H  or  G)  =  P(H)  +  P(G)  -  P(H  and  G)  =  0.26  +  0.43  -  0.14  =  0.55 

B.  P(  NOT  (H  and  G) )  =  1  -  P(H  and  G)  =  1  -  0.14  =  0.86 

C.  P(  NOT  (H  or  G) )  =  1  -  P(H  or  G)  =  1  -  0.55  =  0.45 

Solution  to  Exercise  3.11.34  (p.  152) 

A.  P(J  or  K)  =  P(J)  +  P(K)  -  P(J  and  K);  0.45  =  0.18  +  0.37  -  PQ  and  K) ;  solve  to  find  PQ  and  K)  =  0.10 

B.  P(  NOT  0  and  K) )  =  1  -  Pg  and  K)  =  1  -  0.10  =  0.90 

C.  P(NOTaorK))  =  l  -Pg  or  K)  =  l  -0.45  =  0.55 

Solution  to  Exercise  3.11.35  (p.  153) 

A.  P(Type  O  or  Rh-)  =  P(Type  O)  +  P(Rh-)  -  P(Type  O  and  Rh-) 

0.52  =  0.43  +  0.15  -  P(Type  O  and  Rh-);  solve  to  find  P(Type  O  and  Rh-)  =  0.06 
6%  of  people  have  t5^e  O  Rh—  blood 

B.  P(  NOT  (Type  O  and  Rh-) )  =  1  -  P(Type  O  and  Rh-)  =  1  -  0.06  =  0.94 

94%  of  people  do  not  have  type  O  Rh—  blood 

Solution  to  Exercise  3.11.36  (p.  153) 

A.  P(R  or  F)  =  P{R)  +  P(F)  -  P(R  and  F)  =  0.72  +  0.46  -  0.32  =  0.86 

B.  P(  Neither  R  nor  F  )  =  1  -  P(R  or  F)  =  1  -  0.86  =  0.14 

Solution  to  Exercise  3.11.37  (p.  153) 

Let  C  be  the  event  that  the  cookie  contains  chocolate.  Let  N  be  the  event  that  the  cookie  contains  nuts. 

A.  P(C  or  N)  =  P(C)  +  P(N)  -  P(C  and  N)  =  0.36  +  0.12  -  0.08  =  0.40 

B.  P(  neither  chocolate  nor  nuts)  =  1  -  P(C  or  N)  =  1  -  0.40  =  0.60 

Solution  to  Exercise  3.11.38  (p.  153) 

A.  P(D  and  E)  =  P(D  I  E)P(E)  =  (0.20)(0.40)  =  0.08 

B.  P(E  I D)  =  P(D  and  E)  /  P(D)  =  0.08/0.10  =  0.80 

C.  P(D  or  E)  =  P(D)  +  P{E)  -  P(D  and  E)  =  0.10  +  0.40  -  0.08  =  0.42 

D.  Not  Independent:  P(D  I E)  =  0.20  which  does  not  equal  P(D)  =  .10 

E.  Not  Mutually  Exclusive:  P(D  and  E)  =  0.08 ;  if  they  were  mutually  exclusive  then  we  would  need  to  have 

P(D  and  E)  =  0,  which  is  not  true  here. 

Solution  to  Exercise  3.11.39  (p.  153) 

A.  P(H)  =  140/250;  P(T)  =  110/250 

C.  308/625 

D.  504/625 
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Solutions  to  Review 
Solution  to  Exercise  3.12.1  (p.  155) 

C.  Parameter 

Solution  to  Exercise  3.12.2  (p.  155) 

A.  Population 

Solution  to  Exercise  3.12.3  (p.  155) 

B.  Statistic 

Solution  to  Exercise  3.12.4  (p.  155) 

D.  Sample 

Solution  to  Exercise  3.12.5  (p.  155) 

E.  Variable 

Solution  to  Exercise  3.12.6  (p.  156) 
quantitative  -  continuous 
Solution  to  Exercise  3.12.7  (p.  156) 

a.  2.27 

b.  3.04 

c.  -1,4,4 
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Chapter  4 

Discrete  Random  Variables 


4.1  Discrete  Random  Variables^ 

4.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Recognize  and  understand  discrete  probability  distribution  functions,  in  general. 

•  Calculate  and  interpret  expected  values. 

•  Recognize  the  binomial  probability  distribution  and  apply  it  appropriately 

•  Recognize  the  Poisson  probability  distribution  and  apply  it  appropriately  (optional). 

•  Recognize  the  geometric  probability  distribution  and  apply  it  appropriately  (optional). 

•  Recognize  the  hypergeometric  probability  distribution  and  apply  it  appropriately  (optional). 

•  Classify  discrete  word  problems  by  their  distributions. 

4.1.2  Introduction 

A  student  takes  a  10  question  true-false  quiz.  Because  the  student  had  such  a  busy  schediile,  he  or  she 
could  not  study  and  randomly  guesses  at  each  answer.  What  is  the  probability  of  the  student  passing  the 
test  with  at  least  a  70%? 

Small  companies  might  be  interested  in  the  number  of  long  distance  phone  calls  their  employees  make 
during  the  peak  time  of  the  day.  Suppose  the  average  is  20  calls.  What  is  the  probability  that  the  employees 
make  more  than  20  long  distance  phone  calls  during  the  peak  time? 

These  two  examples  illustrate  two  different  types  of  probability  problems  involving  discrete  random  vari- 
ables. Recall  that  discrete  data  are  data  that  you  can  count.  A  random  variable  describes  the  outcomes 
of  a  statistical  experiment  in  words.  The  values  of  a  random  variable  can  vary  with  each  repetition  of  an 
experiment. 

In  this  chapter,  you  will  study  probability  problems  involving  discrete  random  distributions.  You  will  also 
study  long-term  averages  associated  with  them. 

4.1.3  Random  Variable  Notation 

Upper  case  letters  like  X  or  Y  denote  a  random  variable.  Lower  case  letters  like  x  or  y  denote  the  value  of  a 
random  variable.  If  X  is  a  random  variable,  then  X  is  written  in  words,  and  x  is  given  as  a  number. 

'This  content  is  available  online  at  <http://cnx.Org/content/ml6825/l. 14/>. 
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For  example,  let  X  =  the  number  of  heads  you  get  when  you  toss  three  fair  coins.  The  sample  space  for  the 
toss  of  three  fair  coins  is  TTT;  THH;  HTH;  HHT;  HTT;  THT;  TTH;  HHH.  Then,  x  =  0,  1,  2,  3.  X  is  in 
words  and  x  is  a  number.  Notice  that  for  this  example,  the  x  values  are  countable  outcomes.  Because  you 
can  count  the  possible  values  that  X  can  take  on  and  the  outcomes  are  random  (the  x  values  0, 1, 2,  3),  X  is 
a  discrete  random  variable. 

4.1.4  Optional  Collaborative  Classroom  Activity 

Toss  a  coin  10  times  and  record  the  number  of  heads.  After  all  members  of  the  class  have  completed  the 
experiment  (tossed  a  coin  10  times  and  counted  the  number  of  heads),  fill  in  the  chart  using  a  heading  like 
the  one  below.  Let  X  =  the  number  of  heads  in  10  tosses  of  the  coin. 


X 

Frequency  of  x 

Relative  Frequency  of  x 

Table  4.1 


•  Which  value(s)  of  x  occurred  most  frequently? 

•  If  you  tossed  the  coin  1,000  times,  what  values  could  x  take  on?  Which  value(s)  of  x  do  you  think 
would  occur  most  frequently? 

•  What  does  the  relative  frequency  column  sum  to? 

4.2  Probability  Distribution  Function  (PDF)  for  a  Discrete  Random 
Variable^ 

A  discrete  probability  distribution  function  has  two  characteristics: 

•  Each  probability  is  between  0  and  1,  inclusive. 

•  The  svm  of  the  probabilities  is  1. 

Example  4.1 

A  child  psychologist  is  interested  in  the  number  of  times  a  newborn  baby's  crying  wakes  its  mother 
after  midnight.  For  a  random  sample  of  50  mothers,  the  following  information  was  obtained.  Let 
X  =  the  niraiber  of  times  a  newborn  wakes  its  mother  after  midnight.  For  this  example,  x  =  0,1, 1, 
3,4,5. 

P(x)  =  probability  that  X  takes  on  a  value  x. 
^This  content  is  available  online  at  <http://cnx.org/content/ml6831/1.14/>. 
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X 

P(x) 

0 

P(x=0)  = 

1 

P(x=l)  =  11 

2 

P(x=2)  =  i 

3 

P(x=3)  =  9 

4 

P(x=4)=  A 

5 

P(x=5)  =  i 

Table  4.2 


X  takes  on  the  values  0, 1, 2, 3, 4, 5.  This  is  a  discrete  PDF  because 

1.  Each  P(x)  is  between  0  and  1,  inclusive. 

2.  The  sum  of  the  probabilities  is  1,  that  is, 

2  11  23  9  4  1  _ 
50"^50"^50"^50"^50"^50~ 


Example  4.2 

Suppose  Nancy  has  classes  3  days  a  week.  She  attends  classes  3  days  a  week  80%  of  the  time,  2 
days  15%  of  the  time,  1  day  4%  of  the  time,  and  no  days  1%  of  the  time.  Suppose  one  week  is 
randomly  selected. 

Problem  1  (Solution  on  p.  213.) 

Let  X  =  the  niraiber  of  days  Nancy  . 

Problem  2  (Solution  on  p.  213.) 

X  takes  on  what  values? 

Problem  3  (Solution  on  p.  213.) 

Suppose  one  week  is  randomly  chosen.  Construct  a  probability  distribution  table  (called  a  PDF 
table)  like  the  one  in  the  previous  example.  The  table  should  have  two  colirams  labeled  x  and  P(x). 
What  does  the  P(x)  coliram  sirai  to? 


4.3  Mean  or  Expected  Value  and  Standard  Deviation^ 

The  expected  value  is  often  referred  to  as  the  "long-term"average  or  mean  .  This  means  that  over  the  long 
term  of  doing  an  experiment  over  and  over,  you  would  expect  this  average. 

The  mean  of  a  random  variable  Xis  ji.  If  we  do  an  experiment  many  times  (for  instance,  flip  a  fair  coin,  as 
Karl  Pearson  did,  24,000  times  and  let  X  =  the  number  of  heads)  and  record  the  value  of  X  each  time,  the 
average  is  likely  to  get  closer  and  closer  to  }i  as  we  keep  repeating  the  experiment.  This  is  known  as  the 
Law  of  Large  Numbers. 

NOTE:  To  find  the  expected  value  or  long  term  average,  }i,  simply  miiltiply  each  value  of  the 
random  variable  by  its  probability  and  add  the  products. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6828/l. 16/>. 
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A  Step-by-Step  Example 

A  men's  soccer  team  plays  soccer  0,  1,  or  2  days  a  week.  The  probability  that  they  play  0  days  is  0.2,  the 
probability  that  they  play  1  day  is  0.5,  and  the  probability  that  they  play  2  days  is  0.3.  Find  the  long-term 
average,  ji,  or  expected  value  of  the  days  per  week  the  men's  soccer  team  plays  soccer. 

To  do  the  problem,  first  let  the  random  variable  X  =  the  number  of  days  the  men's  soccer  team  plays  soccer 
per  week.  X  takes  on  the  values  0,  1,  2.  Construct  a  PDF  table,  adding  a  colurrm  xP  (x).  In  this  column, 
you  will  miiltiply  each  x  value  by  its  probability. 

Expected  Value  Table 


X 

P(x) 

xP(x) 

0 

0.2 

(0)(0.2)  =  0 

1 

0.5 

(1)(0.5)  =  0.5 

2 

0.3 

(2)(0.3)  =  0.6 

Table  4.4:  This  table  is  called  an  expected  value  table.  The  table  helps  you  calculate  the  expected  value  or 

long-term  average. 

Add     the     last     column     to     find     the     long     term     average     or     expected  value: 

(0)  (0.2)  +  (1)  (0.5)  +  (2)  (0.3)  =  0  +  0.5  +  0.6  =  1.1. 

The  expected  value  is  1.1.  The  men's  soccer  team  would,  on  the  average,  expect  to  play  soccer  1.1  days 
per  week.  The  number  1.1  is  the  long  term  average  or  expected  value  if  the  men's  soccer  team  plays  soccer 
week  after  week  after  week.  We  say  }i  —  1.1 

Example  4.3 

Find  the  expected  value  for  the  example  about  the  number  of  times  a  newborn  baby's  crying 
wakes  its  mother  after  midnight.  The  expected  value  is  the  expected  number  of  times  a  newborn 
wakes  its  mother  after  midnight. 


X 

P(X) 

xP(X) 

0 

P(x=0) 

2 

~  50 

(0)(ll) 

=  0 

1 

P(x=l) 

11 
~  50 

11 

~  50 

2 

P(x=2) 

_  23 

^  50 

(2)(i) 

_  46 
~  50 

3 

P(x=3) 

9 

^  50 

(3)(||) 

27 
~  50 

4 

P(x=4) 

4 

^  50 

(4)(|j) 

_  16 
~  50 

5 

P(x=5) 

1 

~  50 

_  5 
~  50 

Table  4.5:  You  expect  a  newborn  to  wake  its  mother  after  midnight  2.1  times,  on  the  average. 

Add  the  last  column  to  find  the  expected  value.  }i  =  Expected  Value  =  ^  =  2.1 
Problem 

Go  back  and  calculate  the  expected  value  for  the  number  of  days  Nancy  attends  classes  a  week. 
Construct  the  third  colunm  to  do  so. 

Solution 

2.74  days  a  week. 
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Example  4.4 

Suppose  you  play  a  game  of  chance  in  which  five  numbers  are  chosen  from  0, 1,  2,  3,  4,  5,  6,  7,  8, 
9.  A  computer  randomly  selects  five  numbers  from  0  to  9  with  replacement.  You  pay  $2  to  play 
and  could  profit  $100,000  if  you  match  all  5  numbers  in  order  (you  get  your  $2  back  plus  $100,000). 
Over  the  long  term,  what  is  your  expected  profit  of  playing  the  game? 

To  do  this  problem,  set  up  an  expected  value  table  for  the  amoimt  of  money  you  can  profit. 

Let  X  =  the  amoimt  of  money  you  profit.  The  values  of  x  are  not  0, 1, 2, 3, 4, 5,  6,  7,  8, 9.  Since  you 
are  interested  in  your  profit  (or  loss),  the  values  of  x  are  100,000  dollars  and  -2  dollars. 

To  win,  you  must  get  all  5  numbers  correct,  in  order.  The  probability  of  choosing  one  correct 
number  is  ^  because  there  are  10  numbers.  You  may  choose  a  number  more  than  once.  The 
probability  of  choosing  all  5  numbers  correctly  and  in  order  is: 

—  ^  —  ^  —  ^  —  ^—^  =  1*  10-5  ^  0.00001  (4.2) 
10    10    10    10    10  ^  ' 

Therefore,  the  probability  of  winning  is  0.00001  and  the  probability  of  losing  is 

1  -  0.00001  =  0.99999  (4.3) 
The  expected  value  table  is  as  follows. 


X 

P(x) 

xP(x) 

Loss 

-2 

0.99999 

(-2)(0.99999)=-1.99998 

Profit 

100,000 

0.00001 

(100000)(0.00001)=1 

Table  4.6:  Add  the  last  column.  -1.99998  +  1  =  -0.99998 


Since  —0.99998  is  about  —1,  you  would,  on  the  average,  expect  to  lose  approximately  one  dollar 
for  each  game  you  play.  However,  each  time  you  play,  you  either  lose  $2  or  profit  $100,000.  The  $1 
is  the  average  or  expected  LOSS  per  game  after  playing  this  game  over  and  over. 

Example  4.5 

Suppose  you  play  a  game  with  a  biased  coin.  You  play  each  game  by  tossing  the  coin  once. 
P(heads)  =  j  and  P(tails)  =  j.  If  you  toss  a  head,  you  pay  $6.  If  you  toss  a  tail,  you  win  $10. 
If  you  play  this  game  many  times,  will  you  come  out  ahead? 

Problem  1  (Solution  on  p.  213.) 

Define  a  random  variable  X. 


Problem  2 

Complete  the  following  expected  value  table. 


(Solution  on  p.  213.) 


WIN  10 


LOSE 


-12 
3 


Table  4.7 
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Problem  3  (Solution  on  p.  213.) 

What  is  the  expected  value,  ji?  Do  you  come  out  ahead? 

Like  data,  probability  distributions  have  standard  deviations.  To  calculate  the  standard  deviation  {a)  of  a 
probability  distribution,  find  each  deviation  from  its  expected  value,  square  it,  multiply  it  by  its  probability, 
add  the  products,  and  take  the  square  root .  To  understand  how  to  do  the  calculation,  look  at  the  table  for 
the  niraiber  of  days  per  week  a  men's  soccer  team  plays  soccer.  To  find  the  standard  deviation,  add  the 
entries  in  the  coliimn  labeled  {x  —      ■  P  (x)  and  take  the  square  root. 


X 

P(x) 

xP(x) 

(x-F)'PW 

0 

0.2 

(0)(0.2)  =  0 

(0-1.1)2  (.2)  ^  Q  242 

1 

0.5 

(1)(0.5)  =  0.5 

(1-1.1)2  (.5)  ^0.005 

2 

0.3 

(2)(0.3)  =  0.6 

(2  -  1.1)2       ^  Q_243 

Table  4.8 


Add  the  last  column  in  the  table.  0.242  +  0.005  +  0.243  =  0.490.  The  standard  deviation  is  the  square  root 
of  0.49.  cr  =  VtU9  =  0.7 

Generally  for  probability  distributions,  we  use  a  calculator  or  a  computer  to  calculate  ji  and  cr  to  reduce 
roundoff  error.  For  some  probability  distributions,  there  are  short-cut  formulas  that  calculate  }i  and  cr. 

4.4  Common  Discrete  Probability  Distribution  Functions^ 

Some  of  the  more  common  discrete  probability  functions  are  binomial,  geometric,  hypergeometric,  and 
Poisson.  Most  elementary  courses  do  not  cover  the  geometric,  hypergeometric,  and  Poisson.  Your  instruc- 
tor wUl  let  you  know  if  he  or  she  wishes  to  cover  these  distributions. 

A  probability  distribution  function  is  a  pattern.  You  try  to  fit  a  probability  problem  into  a  pattern  or  distri- 
bution in  order  to  perform  the  necessary  calculations.  These  distributions  are  tools  to  make  solving  prob- 
ability problems  easier.  Each  distribution  has  its  own  special  characteristics.  Learning  the  characteristics 
enables  you  to  distinguish  among  the  different  distributions. 

4.5  BinomiaF 

The  characteristics  of  a  binomial  experiment  are: 

1.  There  are  a  fixed  number  of  trials.  Think  of  trials  as  repetitions  of  an  experiment.  The  letter  n  denotes 
the  number  of  trials. 

2.  There  are  only  2  possible  outcomes,  called  "success"  and,  "failure"  for  each  trial.  The  letter  p  denotes 
the  probability  of  a  success  on  one  trial  and  q  denotes  the  probability  of  a  failure  on  one  trial,  p  +  q  —  1. 

3.  The  n  trials  are  independent  and  are  repeated  using  identical  conditions.  Because  the  n  trials  are  in- 
dependent, the  outcome  of  one  trial  does  not  help  in  predicting  the  outcome  of  another  trial.  Another 
way  of  saying  this  is  that  for  each  individual  trial,  the  probability,  p,  of  a  success  and  probability,  q, 
of  a  failure  remain  the  same.  For  example,  randomly  guessing  at  a  true  -  false  statistics  question  has 
only  two  outcomes.  If  a  success  is  guessing  correctly,  then  a  failure  is  guessing  incorrectly.  Suppose 

*This  content  is  available  online  at  <http://cnx.org/content/ml6821 /1.6/>. 
^This  content  is  available  online  at  <http://cnx.org/content/ml6820/1.17/>. 
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Joe  always  guesses  correctly  on  any  statistics  true  -  false  question  with  probability  p  —  0.6.  Then, 
q  —  0.4  .This  means  that  for  every  true  -  false  statistics  question  Joe  answers,  his  probability  of  success 
{p  —  0.6)  and  his  probability  of  failure  {q  —  0.4)  remain  the  same. 

The  outcomes  of  a  binomial  experiment  fit  a  binomial  probability  distribution.  The  random  variable  X  — 
the  number  of  successes  obtained  in  the  n  independent  trials. 

The  mean,  fi,  and  variance,  cr^,  for  the  binomial  probability  distribution  is  ^  —  np  and  —  npq.  The 
standard  deviation,  a,  is  then  a  =  ^npq. 


Any  experiment  that  has  characteristics  2  and  3  and  where  n  =  1  is  called  a  Bernoulli  Trial  (named  after 
Jacob  Bernoulli  who,  in  the  late  1600s,  studied  them  extensively).  A  binomial  experiment  takes  place  when 
the  number  of  successes  is  counted  in  one  or  more  Bernoiilli  Trials. 

Example  4.6 

At  ABC  College,  the  withdrawal  rate  from  an  elementary  physics  course  is  30%  for  any  given 
term.  This  implies  that,  for  any  given  term,  70%  of  the  students  stay  in  the  class  for  the  entire 
term.  A  "success"  could  be  defined  as  an  individual  who  withdrew.  The  random  variable  is  X  = 
the  number  of  students  who  withdraw  from  the  randomly  selected  elementary  physics  class. 

Example  4.7 

Suppose  you  play  a  game  that  you  can  only  either  win  or  lose.  The  probability  that  you  win  any 

game  is  55%  and  the  probability  that  you  lose  is  45%.  Each  game  you  play  is  independent.  If  you 
play  the  game  20  times,  what  is  the  probability  that  you  win  15  of  the  20  games?  Here,  if  you 
define  X  =  the  number  of  wins,  then  X  takes  on  the  values  0,  1,  2,  3,  20.  The  probability  of  a 
success  is  p  =  0.55.  The  probability  of  a  failure  is  q  =  0.45.  The  number  of  trials  is  n  =  20.  The 
probability  question  can  be  stated  mathematically  asP     —  15) . 

Example  4.8 

A  fair  coin  is  flipped  15  times.  Each  flip  is  independent.  What  is  the  probability  of  getting  more 
than  10  heads?  Let  X  =  the  number  of  heads  in  15  flips  of  the  fair  coin.  X  takes  on  the  values  0, 1, 
2,  3, 15.  Since  the  coin  is  fair,  p  -  0.5  and  q  -  0.5.  The  number  of  trials  is  n  =  15.  The  probability 
question  can  be  stated  mathematically  as  P  (^c  >  10). 

Example  4.9 

Approximately  70%  of  statistics  students  do  their  homework  in  time  for  it  to  be  collected  and 
graded.  Each  student  does  homework  independently.  In  a  statistics  class  of  50  students,  what  is 
the  probability  that  at  least  40  will  do  their  homework  on  time?  Students  are  selected  randomly. 

Problem  1  (Solution  on  p.  213.) 

This  is  a  binomial  problem  because  there  is  only  a  success  or  a  ,  there  are  a  definite 

number  of  trials,  and  the  probability  of  a  success  is  0.70  for  each  trial. 

Problem  2  (Solution  on  p.  213.) 

If  we  are  interested  in  the  number  of  students  who  do  their  homework,  then  how  do  we  define 


X? 


Problem  3 

What  values  does  x  take  on? 


(Solution  on  p.  213.) 


Problem  4 

What  is  a  "failure",  in  words? 

The  probability  of  a  success  isp  =  0.70.  The  number  of  trial  is  n  =  50. 
Problem  5 

li  p  +  q  —  1,  then  what  is  ql 


(Solution  on  p.  213.) 


(Solution  on  p.  213.) 
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Problem  6  (Solution  on  p.  213.) 
The  words  "at  least"  translate  as  what  kind  of  inequality  for  the  probability  question  P  {x  40). 

The  probability  question  isP{x  >  40). 


4.5.1  Notation  for  the  Binomial:  B  =  Binomial  Probability  Distribution  Function 

X  -  B(n,p) 

Read  this  as  "X  is  a  random  variable  with  a  binomial  distribution."  The  parameters  are  n  and  p.  n  =  niimber 
of  trials  p  =  probability  of  a  success  on  each  trial 
Example  4.10 

It  has  been  stated  that  about  41%  of  adult  workers  have  a  high  school  diploma  but  do  not  pursue 
any  further  education.  If  20  adult  workers  are  randomly  selected,  find  the  probability  that  at  most 
12  of  them  have  a  high  school  diploma  but  do  not  pursue  any  fiirther  education.  How  many  adult 
workers  do  you  expect  to  have  a  high  school  diploma  but  do  not  piursue  any  further  education? 

Let  X  =  the  nirmber  of  workers  who  have  a  high  school  diploma  but  do  not  pursue  any  further 

education. 

X  takes  on  the  values  0, 1, 2, 20  where  n  =  20  and  p  =  0.41.  q  =  l-  0.41  =  0.59.  X  ~  B  (20, 0.41) 
Find  P{x<12).P{x<  12)  =  0.9738.  (calculator  or  computer) 

Using  the  TI-83+  or  the  TI-84  calculators,  the  calculations  are  as  follows.  Go  into  2nd  DISTR.  The 
syntax  for  the  instructions  are 

To  calculate  (x  =  value):  binompdf(n,  p,  number)  If  "number"  is  left  out,  the  result  is  the  binomial 
probability  table. 

To  calculate  P  {x  <  value):  binomcdf(n,  p,  number)  If  "niimber"  is  left  out,  the  result  is  the  cumu- 
lative binomial  probability  table. 

For  this  problem:  After  you  are  in  2nd  DISTR,  arrow  down  to  binomcdf.  Press  ENTER.  Enter 
20,.41,12).  The  result  is  P  (;c  <  12)  =  0.9738. 

NOTE:  If  you  want  to  find  P{x  =  12),  use  the  pdf  (binompdf).  If  you  want  to  find  P(x>12),  use  1  - 
binomcdf(20,.41,12). 

The  probability  at  most  12  workers  have  a  high  school  diploma  but  do  not  pursue  any  further 
education  is  0.9738 

The  graph  of ~  B  (20, 0.41)  is: 
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P(X=5:> 


0 1:23  4?  ::o 

The  y-axis  contains  the  probability  of  x,  where  X  : 
school  diploma. 


the  number  of  workers  who  have  only  a  high 


The  number  of  adult  workers  that  you  expect  to  have  a  high  school  diploma  but  not  pursue  any 
further  education  is  the  mean,  }i  =  np  =  (20)  (0.41)  =  8.2. 


nipq. 


The  standard  deviation  is  cr 


The  formula  for  the  variance  is 
7(20)  (0.41)  (0.59)  =  2.20. 

Example  4.11 

The  following  example  illustrates  a  problem  that  is  not  binomial.  It  violates  the  condition  of  in- 
dependence. ABC  College  has  a  student  advisory  committee  made  up  of  10  staff  members  and 
6  students.  The  committee  wishes  to  choose  a  chairperson  and  a  recorder.  What  is  the  proba- 
bility that  the  chairperson  and  recorder  are  both  students?  All  names  of  the  committee  are  put 
into  a  box  and  two  names  are  drawn  without  replacement.  The  first  name  drawn  determines  the 
chairperson  and  the  second  name  the  recorder.  There  are  two  trials.  However,  the  trials  are  not 
independent  because  the  outcome  of  the  first  trial  affects  the  outcome  of  the  second  trial.  The 
probability  of  a  student  on  the  first  draw  is  ^.  The  probability  of  a  student  on  the  second  draw 
is  ^,  when  the  first  draw  produces  a  student.  The  probability  is  ^  when  the  first  draw  produces 
a  staff  member.  The  probability  of  drawing  a  student's  name  changes  for  each  of  the  trials  and, 
therefore,  violates  the  condition  of  independence. 


4.6  Geometric  (optional)*^ 

The  characteristics  of  a  geometric  experiment  are: 

1.  There  are  one  or  more  Bernoulli  trials  with  all  failures  except  the  last  one,  which  is  a  success.  In  other 
words,  you  keep  repeating  what  you  are  doing  until  the  first  success.  Then  you  stop.  For  example, 
you  throw  a  dart  at  a  bull's  eye  until  you  hit  the  bull's  eye.  The  first  time  you  hit  the  bull's  eye  is  a 
"success"  so  you  stop  throwing  the  dart.  It  might  take  you  6  tries  until  you  hit  the  bull's  eye.  You  can 
think  of  the  trials  as  failure,  failure,  failure,  failure,  failure,  success.  STOP. 

2.  In  theory,  the  number  of  trials  could  go  on  forever.  There  must  be  at  least  one  trial. 

3.  The  probabilityp,  of  a  success  and  the  probability,  q,  of  a  failure  is  the  same  for  each  trial,  p  +  q  =  1 
and  q  =  1  —  p.  For  example,  the  probability  of  rolling  a  3  when  you  throw  one  fair  die  is  g.  This  is 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6822/l.16/>. 

Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


176 


CHAPTER  4.  DISCRETE  RANDOM  VARIABLES 


true  no  matter  how  many  times  you  roll  the  die.  Suppose  you  want  to  know  the  probability  of  getting 
the  first  3  on  the  fifth  roll.  On  rolls  1,  2,  3,  and  4,  you  do  not  get  a  face  with  a  3.  The  probability  for 
each  of  rolls  1,  2,  3,  and  Ais  q  —  g,  the  probability  of  a  failure.  The  probability  of  getting  a  3  on  the 
fifth  roU  is  5  •  5  •  5  •  5  •  5  =  0.0804 

X  =  the  number  of  independent  trials  until  the  first  success.  The  mean  and  variance  are  in  the  siraraiary  ia 
this  chapter. 

Example  4.12 

You  play  a  game  of  chance  that  you  can  either  win  or  lose  (there  are  no  other  possibilities)  until 
you  lose.  Your  probability  of  losing  is  p  =  0.57.  What  is  the  probability  that  it  takes  5  games  until 
you  lose?  Let  X  =  the  number  of  games  you  play  until  you  lose  (includes  the  losing  game).  Then 
X  takes  on  the  values  1, 2, 3, ...  (could  go  on  indefinitely).  The  probability  question  is  P  (3:  =  5). 

Example  4.13 

A  safety  engineer  feels  that  35%  of  all  industrial  accidents  in  her  plant  are  caused  by  failure  of 
employees  to  follow  instructions.  She  decides  to  look  at  the  accident  reports  (selected  randomly 
and  replaced  in  the  pile  after  reading)  until  she  finds  one  that  shows  an  accident  caused  by  failure 
of  employees  to  foUow  instructions.  On  the  average,  how  many  reports  would  the  safety  engineer 
expect  to  look  at  until  she  finds  a  report  showing  an  accident  caused  by  employee  failure  to  follow 
instructions?  What  is  the  probability  that  the  safety  engineer  will  have  to  examine  at  least  3  reports 
vmtil  she  finds  a  report  showing  an  accident  caused  by  employee  failure  to  follow  instructions? 

Let  X  =  the  number  of  accidents  the  safety  engineer  must  examine  until  she  finds  a  report  showing 
an  accident  caused  by  employee  failure  to  follow  instructions.  X  takes  on  the  values  1, 2, 3, ....  The 
first  question  asks  you  to  find  the  expected  value  or  the  mean.  The  second  question  asks  you  to 
find  P  (x  >  3).  ("At  least"  translates  as  a  "greater  than  or  equal  to"  sjmabol). 

Example  4.14 

Suppose  that  you  are  looking  for  a  student  at  your  college  who  lives  within  five  miles  of  you.  You 
know  that  55%  of  the  25,000  students  do  live  within  five  miles  of  you.  You  randomly  contact  stu- 
dents from  the  college  until  one  says  he/she  lives  within  five  miles  of  you.  What  is  the  probability 
that  you  need  to  contact  four  people? 

This  is  a  geometric  problem  because  you  may  have  a  number  of  failures  before  you  have  the  one 
success  you  desire.  Also,  the  probability  of  a  success  stays  the  same  each  time  you  ask  a  student  if 
he/she  lives  within  five  miles  of  you.  There  is  no  definite  number  of  trials  (number  of  times  you 
ask  a  student). 

Problem  1 

Let  X  =  the  nirmber  of  you  must  ask  one  says  yes. 

Solution 

Let  X  =  the  number  of  students  you  must  ask  until  one  says  yes. 

Problem  2  (Solution  on  p.  213.) 

What  values  does  X  take  on? 

Problem  3  (Solution  on  p.  213.) 

What  are  p  and  q? 

Problem  4  (Solution  on  p.  214.) 

The  probability  question  is  P(  ). 
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4.6.1  Notation  for  the  Geometric:  G  =  Geometric  Probability  Distribution  Function 

X  -  G(p) 

Read  this  as  "X  is  a  random  variable  with  a  geometric  distribution."  The  parameter  is  p.  p  =  the  probability 
of  a  success  for  each  trial. 
Example  4.15 

Assume  that  the  probability  of  a  defective  computer  component  is  0.02.  Components  are  randomly 
selected.  Find  the  probability  that  the  first  defect  is  caused  by  the  7th  component  tested.  How 
many  components  do  you  expect  to  test  until  one  is  found  to  be  defective? 

Let  X  =  the  number  of  computer  components  tested  until  the  first  defect  is  found. 

X  takes  on  the  values  1,  2,  3, ...  where  p  =  0.02.  X  ~  G(0.02) 

Find  P  {x  =  7).  P  {x  =  7)  =  0.0177.  (calculator  or  computer) 

TI-83+  and  TI-84:  For  a  general  discussion,  see  this  example  (binomial).  The  S5mtax  is  similar 
The  geometric  parameter  list  is  (p,  number)  If  "number"  is  left  out,  the  result  is  the  geometric 
probability  table.  For  this  problem:  After  you  are  in  2nd  DISTR,  arrow  down  to  D:geometpdf. 
Press  ENTER.  Enter  .02,7).  The  result  is  P  (x  =  7)  =  0.0177. 

The  probability  that  the  7th  component  is  the  first  defect  is  0.0177. 

The  graph  of  X     G(0.02)  is: 


123  4... 


The  y-axis  contains  the  probability  of  x,  where  X  =  the  number  of  computer  components  tested. 

The  number  of  components  that  you  would  expect  to  test  until  you  find  the  first  defective  one  is 
the  mean,  }i  =  50. 

The  formula  for  the  mean  is  }i  =  jj  =       =  50 

The  formula  for  the  variance  is     =  ^  ■  (j;  —  l  \  =       ■         —  1\  =  2450 


The  standard  deviation  is  a 


1    /  1 


1 


1 


0.02     V  0.02 


1 


49.5 
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The  characteristics  of  a  hypergeometric  experiment  are: 
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1.  You  take  samples  from  2  groups. 

2.  You  are  concerned  with  a  group  of  interest,  called  the  first  group. 

3.  You  sample  without  replacement  from  the  combined  groups.  For  example,  you  want  to  choose  a 
Softball  team  from  a  combined  group  of  11  men  and  13  women.  The  team  consists  of  10  players. 

4.  Each  pick  is  not  independent,  since  sampling  is  without  replacement.  In  the  softball  example,  the 
probability  of  picking  a  women  first  is  The  probability  of  picking  a  man  second  is  if  a  woman 
was  picked  first.  It  is  jj  if  a  man  was  picked  first.  The  probability  of  the  second  pick  depends  on  what 
happened  in  the  first  pick. 

5.  You  are  not  dealing  with  Bernoulli  Trials. 

The  outcomes  of  a  hypergeometric  experiment  fit  a  hypergeometric  probability  distribution.  The  random 
variable  X  —  the  number  of  items  from  the  group  of  interest.  The  mean  and  variance  are  given  in  the 
siraimary. 

Example  4.16 

A  candy  dish  contains  100  jelly  beans  and  80  gumdrops.  Fifty  candies  are  picked  at  random.  What 
is  the  probability  that  35  of  the  50  are  gumdrops?  The  two  groups  are  jelly  beans  and  gumdrops. 
Since  the  probability  question  asks  for  the  probability  of  picking  gumdrops,  the  group  of  interest 
(first  group)  is  gumdrops.  The  size  of  the  group  of  interest  (first  group)  is  80.  The  size  of  the 
second  group  is  100.  The  size  of  the  sample  is  50  (jelly  beans  or  gumdrops).  Let  X  =  the  number 
of  gumdrops  in  the  sample  of  50.  X  takes  on  the  values  x  =  0,1, 2, 50.  The  probability  question 
isP(x  =  35). 

Example  4.17 

Suppose  a  shipment  of  100  VCRs  is  known  to  have  10  defective  VCRs.  An  inspector  randomly 
chooses  12  for  inspection.  He  is  interested  in  determining  the  probability  that,  among  the  12,  at 
most  2  are  defective.  The  two  groups  are  the  90  non-defective  VCRs  and  the  10  defective  VCRs. 
The  group  of  interest  (first  group)  is  the  defective  group  because  the  probability  question  asks  for 
the  probability  of  at  most  2  defective  VCRs.  The  size  of  the  sample  is  12  VCRs.  (They  may  be 
non-defective  or  defective.)  Let  X  -  the  number  of  defective  VCRs  in  the  sample  of  12.  X  takes  on 
the  values  0, 1,  2,  10.  X  may  not  take  on  the  values  11  or  12.  The  sample  size  is  12,  but  there 
are  only  10  defective  VCRs.  The  inspector  wants  to  know  P  {x  <1)  ("At  most"  means  "less  than  or 
equal  to"). 

Example  4.18 

You  are  president  of  an  on-campus  special  events  organization.  You  need  a  committee  of  7  to  plan 
a  special  birthday  party  for  the  president  of  the  college.  Your  organization  consists  of  18  women 
and  15  men.  You  are  interested  in  the  number  of  men  on  your  committee.  If  the  members  of  the 
committee  are  randomly  selected,  what  is  the  probability  that  your  committee  has  more  than  4 
men? 

This  is  a  hypergeometric  problem  because  you  are  choosing  your  committee  from  two  groups 
(men  and  women). 

Problem  1  (Solution  on  p.  214.) 

Are  you  choosing  with  or  without  replacement? 

Problem  2  (Solution  on  p.  214.) 

What  is  the  group  of  interest? 

''This  content  is  available  online  at  <http://cnx.org/content/ml6824/1.16/>. 
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Problem  3 

How  many  are  in  the  group  of  interest? 


(Solution  on  p.  214.) 


Problem  4 

How  many  are  in  the  other  group? 


(Solution  on  p.  214.) 


Problem  5 

Let  X  =  _ 


on  the  committee.  What  values  does  X  take  on? 


(Solution  on  p.  214.) 


Problem  6 

The  probability  question  is  P(. 


(Solution  on  p.  214.) 


4.7.1  Notation  for  the  Hypergeometric:  H  =  Hypergeometric  Probability  Distribution 
Function 

X^H{r,  b,  n) 

Read  this  as  "X  is  a  random  variable  with  a  hypergeometric  distribution."  The  parameters  are  r,  b,  and  n.  r 
=  the  size  of  the  group  of  interest  (first  group),  b  =  the  size  of  the  second  group,  n  =  the  size  of  the  chosen 
sample 

Example  4.19 

A  school  site  committee  is  to  be  chosen  randomly  from  6  men  and  5  women.  If  the  committee 
consists  of  4  members  chosen  randomly,  what  is  the  probability  that  2  of  them  are  men?  How 
many  men  do  you  expect  to  be  on  the  committee? 

Let  X  =  the  number  of  men  on  the  committee  of  4.  The  men  are  the  group  of  interest  (first  group). 
X  takes  on  the  values  0, 1, 2, 3, 4,  where  r  —  6,b  —  5 ,  and  n  =  4.  X  ~  H(6, 5, 4) 
Find  P  (;c  =  2).  P  (;c  =  2)  =  0.4545  (calciilator  or  computer) 

NOTE:  Currently,  the  TI-83+  and  TI-84  do  not  have  hypergeometric  probability  functions.  There 
are  a  number  of  computer  packages,  including  Microsoft  Excel,  that  do. 


The  probability  that  there  are  2  men  on  the  committee  is  about  0.45. 


The  graph  of  X~H(6, 5, 4)  is: 
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The  y-axis  contains  the  probability  of  X,  where  X  =  the  number  of  men  on  the  committee. 
You  would  expect  m  =  2.18  (about  2)  men  on  the  committee. 
The  formula  for  the  mean  is  ^  =  -ppf^  =  =2.18 

The  formula  for  the  variance  is  fairly  complex.  You  will  find  it  in  the  Summary  of  the  Discrete 
Probability  Functions  Chapter  (Section  4.9). 


4.8  Poisson' 

Characteristics  of  a  Poisson  experiment: 

1 .  The  Poisson  gives  the  probability  of  a  number  of  events  occurring  in  a  fixed  interval  of  time  or  space  if 
these  events  happen  with  a  known  average  rate  and  independently  of  the  time  since  the  last  event.  For 
example,  a  book  editor  might  be  interested  in  the  number  of  words  spelled  incorrectly  in  a  particular 
book.  It  might  be  that,  on  the  average,  there  are  5  words  spelled  incorrectly  in  100  pages.  The  interval 
is  the  100  pages. 

2.  The  Poisson  may  be  used  to  approximate  the  binomial  if  the  probability  of  success  is  "small"  (such 
as  0.01)  and  the  number  of  trials  is  "large"  (such  as  1000).  You  will  verify  the  relationship  in  the 
homework  exercises,  n  is  the  number  of  trials  and  p  is  the  probability  of  a  "success." 

Poisson  probability  distribution.  The  random  variable  X  =  the  number  of  occurrences  in  the  interval  of 
interest.  The  mean  and  variance  are  given  in  the  summary. 

Example  4.20 

The  average  number  of  loaves  of  bread  put  on  a  shelf  in  a  bakery  in  a  half-hour  period  is  12.  Of 
interest  is  the  number  of  loaves  of  bread  put  on  the  shelf  in  5  minutes.  The  time  interval  of  interest 
is  5  minutes.  What  is  the  probability  that  the  number  of  loaves,  selected  randomly,  put  on  the  shelf 
in  5  minutes  is  3? 

Let  X  =  the  number  of  loaves  of  bread  put  on  the  shelf  in  5  minutes.  If  the  average  number  of 
loaves  put  on  the  shelf  in  30  minutes  (half-hour)  is  12,  then  the  average  number  of  loaves  put  on 
the  shelf  in  5  minutes  is 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6829/l.16/>. 
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(^)  •  12  =  2  loaves  of  bread 

The  probability  question  asks  you  to  find  P  (^c  —  3). 
Example  4.21 

A  certain  bank  expects  to  receive  6  bad  checks  per  day,  on  average.  What  is  the  probability  of  the 
bank  getting  fewer  than  5  bad  checks  on  any  given  day?  Of  interest  is  the  number  of  checks  the 
bank  receives  in  1  day,  so  the  time  interval  of  interest  is  1  day.  Let  X  =  the  number  of  bad  checks 
the  bank  receives  in  one  day.  If  the  bank  expects  to  receive  6  bad  checks  per  day  then  the  average 
is  6  checks  per  day.  The  probability  question  asks  for  P{x  <5). 

Example  4.22 

You  notice  that  a  news  reporter  says  "uh",  on  average,  2  times  per  broadcast.  What  is  the  probabil- 
ity that  the  news  reporter  says  "uh"  more  than  2  times  per  broadcast. 

This  is  a  Poisson  problem  because  you  are  interested  in  knowing  the  number  of  times  the  news 
reporter  says  "uh"  diiring  a  broadcast. 

Problem  1  (Solution  on  p.  214.) 

What  is  the  interval  of  interest? 

Problem  2  (Solution  on  p.  214.) 

What  is  the  average  number  of  times  the  news  reporter  says  "uh"  during  one  broadcast? 

Problem  3  (Solution  on  p.  214.) 

Let  X  =  .  What  values  does  X  take  on? 

Problem  4  (Solution  on  p.  214.) 

The  probability  question  is  P(  ). 


4.8.1  Notation  for  the  Poisson:  P  =  Poisson  Probability  Distribution  Function 

X  ~  P(^) 

Read  this  as  "X  is  a  random  variable  with  a  Poisson  distribution."  The  parameter  is  ^  (or  A).  ^  (or  A)  =  the 
mean  for  the  interval  of  interest. 
Example  4.23 

Leah's  answering  machine  receives  about  6  telephone  calls  between  8  a.m.  and  10  a.m.  What  is 
the  probability  that  Leah  receives  more  than  1  call  in  the  next  15  minutes? 

Let  X  =  the  niraiber  of  calls  Leah  receives  in  15  minutes.  (The  interval  of  interest  is  15  minutes  or 
J  hour.) 

^  =  0,1,2,3,... 

If  Leah  receives,  on  the  average,  6  telephone  calls  in  2  hoiirs,  and  there  are  eight  15  minutes  inter- 
vals in  2  hours,  then  Leah  receives 

1  •  6  =  0.75 

calls  in  15  minutes,  on  the  average.  So,  }i  =  0.75  for  this  problem. 
X  ~  P(0.75) 

Find  P{x>l).P{x>l)^  0.1734  (calculator  or  computer) 
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TI-83+  and  TI-84:  For  a  general  discussion,  see  this  example  (Binomial).  The  syntax  is  similar. 
The  Poisson  parameter  list  is     for  the  interval  of  interest,  number).  For  this  problem: 

Press  1-  and  then  press  2nd  DISTR.  Arrow  down  to  C:poissoncdf.  Press  ENTER.  Enter  .75,1). 
The  result  is  P  (x  >  1)  =  0.1734.  NOTE:  The  TI  calculators  use  A  (lambda)  for  the  mean. 

The  probability  that  Leah  receives  more  than  1  telephone  call  in  the  next  fifteen  minutes  is  about 
0.1734. 

The  graph  of  X  ~  P(0.75)  is: 


P(X-E> 


0    12  3 


The  y-axis  contains  the  probablRty  of  x  where  X  =  the  number  of  calls  in  15  minutes. 
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4.9  Summary  of  Functions' 

Formula  4.1:  Binomial 

X~B  (n,p) 

X  =  the  nvimber  of  successes  in  n  independent  trials 

n  =  the  niraiber  of  independent  trials 

X  takes  on  the  values  x  —  0,1, 2, 3,  ...,n 

p  =  the  probability  of  a  success  for  any  trial 

q  =  the  probability  of  a  failure  for  any  trial 

p  +  q  =  1      q  =  1  -  -p 

The  mean  \s  }i  =  np.  The  standard  deviation  is  cr  —  y/npq. 

Formula  4.2:  Geometric 
X~G  (p) 

X  =  the  number  of  independent  trials  until  the  first  success  (count  the  failures  and  the  first  success) 

X  takes  on  the  values  x=  1, 2, 3, ... 

p  =  the  probability  of  a  success  for  any  trial 

q  -  the  probability  of  a  failure  for  any  trial 

p  +  q  ^1 

q^l-p 

The  mean  is  ^  =  ^ 

The  standard  deviation  is  cr  —  ((p) 

Formula  4.3:  Hj^ergeometric 

X-H(r,  b,n) 

X  =  the  number  of  items  from  the  group  of  interest  that  are  in  the  chosen  sample. 

X  may  take  on  the  values  x=  0, 1, up  to  the  size  of  the  group  of  interest.  (The  minimum  value 
for  X  may  be  larger  than  0  in  some  instances.) 

r  =  the  size  of  the  group  of  interest  (first  group) 

b=  the  size  of  the  second  group 

n=  the  size  of  the  chosen  sample. 

n  <r  +  b 

The  mean  is:  f/  =  ^ 

'This  content  is  available  online  at  <http: / /cnx.org/content/ml6833/l.ll/>. 
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The  standard  deviation  is:  a  — 

Formula  4.4:  Poisson 
X  ~  P(^) 

X  =  the  number  of  occurrences  in  the  interval  of  interest 
X  takes  on  the  values  x  =  0,1, 2, 3, ... 

The  mean  ^  is  typically  given.  (A  is  often  used  as  the  mean  instead  of  When  the  Poisson  is 
used  to  approximate  the  binomial,  we  use  the  binomial  mean  ji  =  np.  n  is  the  binomial  number 
of  trials,  p  =  the  probability  of  a  success  for  each  trial.  This  formula  is  valid  when  n  is  "large"  and 
p  "small"  (a  general  rule  is  that  n  should  be  greater  than  or  equal  to  20  and  p  should  be  less  than 
or  equal  to  0.05).  If  n  is  large  enough  and  p  is  small  enough  then  the  Poisson  approximates  the 
binomial  very  well.  The  variance  vscr^  —  }i  and  the  standard  deviation  is  cr  =  y/Jl 


rbn{r+b—n) 
{r+bf{r+b-l) 
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4.10  Practice  1:  Discrete  Distribution^" 

4.10.1  Student  Learning  Outcomes 

•  The  student  wiU  analyze  the  properties  of  a  discrete  distribution. 

4.10.2  Given: 

A  ballet  instructor  is  interested  in  knowing  what  percent  of  each  year's  class  will  continue  on  to  the  next, 
so  that  she  can  plan  what  classes  to  offer.  Over  the  years,  she  has  established  the  following  probability 
distribution. 

•  Let  X  =  the  number  of  years  a  student  will  study  ballet  with  the  teacher. 

•  Let  P  {x)  =  the  probability  that  a  student  wUl  study  ballet  x  years. 

4.10.3  Organize  the  Data 

Complete  the  table  below  using  the  data  provided. 


X 

P(x) 

x*P(x) 

1 

0.10 

2 

0.05 

3 

0.10 

4 

5 

0.30 

6 

0.20 

7 

0.10 

Table  4.9 


Exercise  4.10.1 

In  words,  define  the  Random  Variable  X. 
Exercise  4.10.2 

P(;C  =  4)  = 

Exercise  4.10.3 

P(x  <4)  = 

Exercise  4.10.4 

On  average,  how  many  years  woiild  you  expect  a  child  to  study  ballet  with  this  teacher? 

4.10.4  Discussion  Question 
Exercise  4.10.5 

What  does  the  column  "P(x)"  sum  to  and  why? 
Exercise  4.10.6 

What  does  the  column  "x  *  P(x)"  sum  to  and  why? 


'This  content  is  available  online  at  <http://cnx.Org/content/ml6830/l. 14/>. 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


186 


CHAPTER  4.  DISCRETE  RANDOM  VARIABLES 


4.11  Practice  2:  Binomial  Distribution" 

4.11.1  Student  Learning  Outcomes 

•  The  student  will  construct  the  Binomial  Distribution. 


4.11.2  Given 

The  Higher  Education  Research  Institute  at  UCLA  collected  data  from  203,967  incoming  first-time, 
full-time  freshmen  from  270  four-year  colleges  and  universities  in  the  U.S.  71.3%  of  those  students  replied 
that,  yes,  they  believe  that  same-sex  couples  should  have  the  right  to  legal  marital  status.  {Source: 
http://heriMcla.edu/PDFs/pubs/TFS/Norms/Monographs/TheAmericanFreshman2011.pdf).  ) 

Suppose  that  you  randomly  pick  8  first-time,  full-time  freshmen  from  the  survey.  You  are  interested 
in  the  number  that  believes  that  same  sex-couples  should  have  the  right  to  legal  marital  status 


4.11.3  Interpret  the  Data 

Exercise  4.11.1 

In  words,  define  the  random  Variable  X. 

Exercise  4.11.2 

X~  

Exercise  4.11.3 

What  values  does  the  random  variable  X  take  on? 
Exercise  4.11.4 

Construct  the  probability  distribution  ftinction  (PDF). 


(Solution  on  p.  214.) 
(Solution  on  p.  214.) 
(Solution  on  p.  214.) 


X 

P(x) 

Table  4.10 


Exercise  4.11.5 

On  average  (m),  how  many  would  you  expect  to  answer  yes? 
Exercise  4.11.6 

What  is  the  standard  deviation  (cr)  ? 


^^This  content  is  available  online  at  <http://caTx.org/content/ml7107/1.18/>. 


(Solution  on  p.  214.) 
(Solution  on  p.  214.) 
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Exercise  4.11.7  (Solution  on  p.  214.) 

What  is  the  probability  that  at  most  5  of  the  freshmen  reply  "yes"? 

Exercise  4.11.8  (Solution  on  p.  214.) 

What  is  the  probability  that  at  least  2  of  the  freshmen  reply  "yes"? 

Exercise  4.11.9 

Construct  a  histogram  or  plot  a  line  graph.  Label  the  horizontal  and  vertical  axes  with  words. 
Include  numerical  scaling. 
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4.12  Practice  3:  Poisson  Distribution^^ 

4.12.1  Student  Learning  Outcomes 

•  The  student  will  analyze  the  properties  of  a  Poisson  distribution. 


4.12.2  Given 

On  average,  eight  teens  in  the  U.S.  die  from  motor  vehicle  injuries  per  day.  As 
a  result,  states  across  the  country  are  debating  raising  the  driving  age.  {Source: 
http:/ / www.cdc.gov/Motorvehiclesafety/Teen_Drivers/ teendrivers_factsheet.html)  ) 


4.12.3  Interpret  the  Data 
Exercise  4.12.1 

Assume  the  event  occurs  independently  in  any  given  day.  In  words,  define  the  Random  Variable 
X. 

Exercise  4.12.2  (Solution  on  p.  214.) 

X  ~  

Exercise  4.12.3  (Solution  on  p.  214.) 

What  values  does  X  take  on? 

Exercise  4.12.4 

For  the  given  values  of  the  random  variable  X,  fill  in  the  corresponding  probabilities. 


X 

P(x) 

0 

4 

8 

10 

11 

15 

Table  4.11 


Exercise  4.12.5  (Solution  on  p.  214.) 

Is  it  likely  that  there  will  be  no  teens  killed  in  the  U.S.  from  motor  vehicle  injiiries  on  any  given 
day?  Justify  your  answer  numerically. 

Exercise  4.12.6  (Solution  on  p.  214.) 

Is  it  likely  that  there  will  be  more  than  20  teens  IdUed  in  the  U.S.  from  motor  vehicle  injuries  on 
any  given  day?  Justify  your  answer  numerically. 


^■^This  content  is  available  online  at  <http://cnx.org/content/ml7109/1.15/>. 
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4.13  Practice  4:  Geometric  Distribution" 

4.13.1  Student  Learning  Outcomes 

•  The  student  wiU  analyze  the  properties  of  a  geometric  distribution. 

4.13.2  Given: 

Use  the  information  from  the  Binomial  Distribution  Practice  (Section  4.11)  shown  below. 

The  Higher  Education  Research  Institute  at  UCLA  collected  data  from  203,967  incoming  first-time, 
full-time  freshmen  from  270  four-year  colleges  and  universities  in  the  U.S.  71.3%  of  those  students  replied 
that,  yes,  they  believe  that  same-sex  couples  should  have  the  right  to  legal  marital  status.  {Source: 
http://heriMcla.edu/PDFs/pubs/TFS/Norms/Monographs/TheAmencanFreshmari2011.pdf) 

Suppose  that  you  randomly  select  freshman  from  the  study  until  you  find  one  who  replies  "yes." 
You  are  interested  in  the  number  of  freshmen  you  must  ask. 

4.13.3  Interpret  the  Data 
Exercise  4.13.1 

In  words,  define  the  Random  Variable  X. 

Exercise  4.13.2 

X  ~ 

Exercise  4.13.3 

What  values  does  the  random  variable  X  take  on? 
Exercise  4.13.4 

Construct  the  probability  distribution  function  (PDF).  Stop  at  x 


X 

P(x) 

1 

2 

3 

4 

5 

6 

Table  4.12 


Exercise  4.13.5  (Solution  on  p.  215.) 

On  average(^),  how  many  freshmen  would  you  expect  to  have  to  ask  until  you  foimd  one  who 

replies  "yes?" 

Exercise  4.13.6  (Solution  on  p.  215.) 

What  is  the  probability  that  you  wiU  need  to  ask  fewer  than  3  freshmen? 

^^This  content  is  available  online  at  <http://cnx.org/content/ml7108/1.17/>. 


(Solution  on  p.  215.) 
(Solution  on  p.  215.) 

=  6. 
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Exercise  4.13.7 

Construct  a  histogram  or  plot  a  line  graph.  Label  the  horizontal  and  vertical  axes  with  words. 
Include  numerical  scaling. 
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4.14  Practice  5:  Hypergeometric  Distribution^^ 

4.14.1  Student  Learning  Outcomes 

•  The  student  wiU  analyze  the  properties  of  a  hypergeometric  distribution. 

4.14.2  Given 

Suppose  that  a  group  of  statistics  students  is  divided  into  two  groups:  business  majors  and  non-business 
majors.  There  are  16  business  majors  in  the  group  and  7  non-business  majors  in  the  group.  A  random 
sample  of  9  students  is  taken.  We  are  interested  in  the  number  of  business  majors  in  the  group. 

4.14.3  Interpret  the  Data 
Exercise  4.14.1 

In  words,  define  the  Random  Variable  X. 

Exercise  4.14.2 

X  ~ 

Exercise  4.14.3 

What  values  does  X  take  on? 

Exercise  4.14.4 

Construct  the  probabiHty  distribution  function  (PDF)  for  X 


X 

P(x) 

Table  4.13 


Exercise  4.14.5  (Solution  on  p.  215.) 

On  average(/^),  how  many  would  you  expect  to  be  business  majors? 


(Solution  on  p.  215.) 
(Solution  on  p.  215.) 


^*This  content  is  available  online  at  <http://cnx.org/content/ml7106/1.13/>. 
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4.15  Homework'^ 

Exercise  4.15.1  (Solution  on  p.  215.) 

1.  Complete  the  PDF  and  answer  the  questions. 


X 

P{X  =  x) 

x-P{X  =  x) 

0 

0.3 

1 

0.2 

2 

3 

0.4 

Table  4.14 


a.  Find  the  probability  that  x  —  2. 
h.  Find  the  expected  value. 

Exercise  4.15.2 

Suppose  that  you  are  offered  the  following  "deal."  You  roll  a  die.  If  you  roll  a  6,  you  win  $10.  If 
you  roll  a  4  or  5,  you  win  $5.  If  you  roll  a  1,  2,  or  3,  you  pay  $6. 

a.  What  are  you  ultimately  interested  in  here  (the  value  of  the  roll  or  the  money  you  win)? 

b.  In  words,  define  the  Random  Variable  X. 

c.  List  the  values  that  X  may  take  on. 

d.  Construct  a  PDF. 

e.  Over  the  long  rim  of  playing  this  game,  what  are  youi  expected  average  winnings  per  game? 

f .  Based  on  numerical  values,  should  you  take  the  deal?  Explain  your  decision  in  complete  sen- 

tences. 

Exercise  4.15.3  (Solution  on  p.  215.) 

A  venture  capitalist,  willing  to  invest  $1,000,000,  has  three  investments  to  choose  from.  The  first 
investment,  a  software  company,  has  a  10%  chance  of  returning  $5,000,000  profit,  a  30%  chance  of 
returning  $1,000,000  profit,  and  a  60%  chance  of  losing  the  million  dollars.  The  second  company, 
a  hardware  company,  has  a  20%  chance  of  returning  $3,000,000  profit,  a  40%  chance  of  returning 
$1,000,000  profit,  and  a  40%  chance  of  losing  the  million  dollars.  The  third  company,  a  biotech 
firm,  has  a  10%  chance  of  returning  $6,000,000  profit,  a  70%  of  no  profit  or  loss,  and  a  20%  chance 
of  losing  the  million  dollars. 

a.  Construct  a  PDF  for  each  investment. 

b.  Find  the  expected  value  for  each  investment. 

c.  Which  is  the  safest  investment?  Why  do  you  think  so? 

d.  Which  is  the  riskiest  investment?  Why  do  you  think  so? 

e.  Which  investment  has  the  highest  expected  return,  on  average? 

Exercise  4.15.4 

A  theater  group  holds  a  fund-raiser.  It  sells  100  raffle  tickets  for  $5  apiece.  Suppose  you  purchase 
4  tickets.  The  prize  is  2  passes  to  a  Broadway  show,  worth  a  total  of  $150. 

a.  What  are  you  interested  in  here? 

'This  content  is  available  online  at  <http: / /cnx.org/content/ml6823/1.20/>. 
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b.  In  words,  define  the  Random  Variable  X. 

c.  List  the  values  that  X  may  take  on. 

d.  Construct  a  PDF. 

e.  If  this  fund-raiser  is  repeated  often  and  you  always  purchase  4  tickets,  what  would  be  your 

expected  average  winnings  per  raffle? 

Exercise  4.15.5  (Solution  on  p.  215.) 

Suppose  that  20,000  married  adults  in  the  United  States  were  randomly  surveyed  as  to  the  number 
of  children  they  have.  The  results  are  compiled  and  are  used  as  theoretical  probabilities.  Let  X  = 
the  number  of  children 


X 

P{X  =  x) 

x-P{X  =  x) 

0 

0.10 

1 

0.20 

2 

0.30 

3 

4 

0.10 

5 

0.05 

6  (or  more) 

0.05 

Table  4.15 


a.  Find  the  probability  that  a  married  adult  has  3  children. 

b.  In  words,  what  does  the  expected  value  in  this  example  represent? 

c.  Find  the  expected  value. 

d.  Is  it  more  likely  that  a  married  adult  will  have  2-3  children  or  4  -  6  children?  How  do  you 

know? 

Exercise  4.15.6 

Suppose  that  the  PDF  for  the  number  of  years  it  takes  to  earn  a  Bachelor  of  Science  (B.S.)  degree 
is  given  below. 


X 

P(X  =  x) 

3 

0.05 

4 

0.40 

5 

0.30 

6 

0.15 

7 

0.10 

Table  4.16 


a.  In  words,  define  the  Random  Variable  X. 

b.  What  does  it  mean  that  the  values  0, 1,  and  2  are  not  included  for  x  in  the  PDF? 

c.  On  average,  how  many  years  do  you  expect  it  to  take  for  an  individual  to  earn  a  B.S.? 
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4.15.1  For  each  problem: 

a.  In  words,  define  the  Random  Variable  X. 

b.  List  the  values  that  X  may  take  on. 

c.  Give  the  distribution  of  X.  X~ 

Then,  answer  the  questions  specific  to  each  individual  problem. 

Exercise  4.15.7  (Solution  on  p.  215.) 

Six  different  colored  dice  are  rolled.  Of  interest  is  the  niimber  of  dice  that  show  a  "1." 

d.  On  average,  how  many  dice  would  you  expect  to  show  a  "1"? 

e.  Find  the  probability  that  all  six  dice  show  a  "1." 

f.  Is  it  more  likely  that  3  or  that  4  dice  will  show  a  "1"?  Use  numbers  to  justify  your  answer 

numerically. 

Exercise  4.15.8 

More  than  96  percent  of  the  very  largest  colleges  and  universities  (more  than  15,000  to- 
tal enrollments)  have  some  online  offerings.  Suppose  you  randomly  pick  13  such  insti- 
tutions. We  are  interested  in  the  niraiber  that  offer  distance  learning  courses.  (Source; 
http://en.wikipedia.org/wiki/Distance_education) 

d.  On  average,  how  many  schools  would  you  expect  to  offer  such  courses? 

e.  Find  the  probability  that  at  most  6  offer  such  courses. 

f.  Is  it  more  likely  that  0  or  that  13  will  offer  such  courses?  Use  numbers  to  justify  your  answer 

numerically  and  answer  in  a  complete  sentence. 

Exercise  4.15.9  (Solution  on  p.  215.) 

A  school  newspaper  reporter  decides  to  randomly  survey  12  students  to  see  if  they  will  attend  Tet 
(Vietnamese  New  Year)  festivities  this  year.  Based  on  past  years,  she  knows  that  18%  of  students 
attend  Tet  festivities.  We  are  interested  in  the  number  of  students  who  will  attend  the  festivities. 

d.  How  many  of  the  12  students  do  we  expect  to  attend  the  festivities? 

e.  Find  the  probability  that  at  most  4  students  will  attend. 

f .  Find  the  probability  that  more  than  2  students  will  attend. 

Exercise  4.15.10 

Suppose  that  about  85%  of  graduating  students  attend  their  graduation.  A  group  of  22  graduating 
students  is  randomly  chosen. 

d.  How  many  are  expected  to  attend  their  graduation? 

e.  Find  the  probability  that  17  or  18  attend. 

f .  Based  on  numerical  values,  woiild  you  be  surprised  if  all  22  attended  graduation?  Justify  yoiur 

answer  numerically. 

Exercise  4.15.11  (Solution  on  p.  216.) 

At  The  Fencing  Center,  60%  of  the  fencers  use  the  foil  as  their  main  weapon.  We  randomly  survey 
25  fencers  at  The  Fencing  Center.  We  are  interested  in  the  numbers  that  do  not  use  the  foil  as  their 
main  weapon. 

d.  How  many  are  expected  to  not  use  the  foil  as  their  main  weapon? 

e.  Find  the  probability  that  six  do  not  use  the  foil  as  their  main  weapon. 

f.  Based  on  numerical  values,  would  you  be  surprised  if  all  25  did  not  use  foil  as  their  main 

weapon?  Justify  your  answer  numerically. 
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Exercise  4.15.12 

Approximately  8%  of  students  at  a  local  high  school  participate  in  after-school  sports  all  four 
years  of  high  school.  A  group  of  60  seniors  is  randomly  chosen.  Of  interest  is  the  niimber  that 
participated  in  after-school  sports  all  four  years  of  high  school. 

d.  How  many  seniors  are  expected  to  have  participated  in  after-school  sports  all  four  years  of  high 

school? 

e.  Based  on  nirmerical  values,  would  you  be  surprised  if  none  of  the  seniors  participated  in  after- 

school  sports  all  four  years  of  high  school?  Justify  your  answer  numerically. 

f.  Based  upon  numerical  values,  is  it  more  likely  that  4  or  that  5  of  the  seniors  participated  in 

after-school  sports  all  four  years  of  high  school?  Justify  your  answer  niimerically. 

Exercise  4.15.13  (Solution  on  p.  216.) 

The  chance  of  having  an  extra  fortune  in  a  fortune  cookie  is  about  3%.  Given  a  bag  of  144  fortune 

cookies,  we  are  interested  in  the  number  of  cookies  with  an  extra  fortune.  Two  distributions  may 
be  used  to  solve  this  problem.  Use  one  distribution  to  solve  the  problem. 

d.  How  many  cookies  do  we  expect  to  have  an  extra  fortune? 

e.  Find  the  probability  that  none  of  the  cookies  have  an  extra  fortune. 

f .  Find  the  probability  that  more  than  3  have  an  extra  fortune. 

g.  As  n  increases,  what  happens  involving  the  probabilities  using  the  two  distributions?  Explain 

in  complete  sentences. 

Exercise  4.15.14 

There  are  two  games  played  for  Chinese  New  Year  and  Vietnamese  New  Year.  They  are  almost 
identical.  In  the  Chinese  version,  fair  dice  with  numbers  1,  2,  3,  4,  5,  and  6  are  used,  along  with 
a  board  with  those  numbers.  In  the  Vietnamese  version,  fair  dice  with  pictures  of  a  gourd,  fish, 
rooster,  crab,  crayfish,  and  deer  are  used.  The  board  has  those  six  objects  on  it,  also.  We  will  play 
with  bets  being  $1.  The  player  places  a  bet  on  a  number  or  object.  The  "house"  rolls  three  dice.  If 
none  of  the  dice  show  the  number  or  object  that  was  bet,  the  house  keeps  the  $1  bet.  If  one  of  the 
dice  shows  the  number  or  object  bet  (and  the  other  two  do  not  show  it),  the  player  gets  back  his 
$1  bet,  plus  $1  profit.  If  two  of  the  dice  show  the  number  or  object  bet  (and  the  third  die  does  not 
show  it),  the  player  gets  back  his  $1  bet,  plus  $2  profit.  If  all  three  dice  show  the  number  or  object 
bet,  the  player  gets  back  his  $1  bet,  plus  $3  profit. 

Let  X  =  number  of  matches  and  Y=  profit  per  game. 

d.  List  the  values  that  Y  may  take  on.  Then,  construct  one  PDF  table  that  includes  both  X  &  Y  and 

their  probabilities. 

e.  Calculate  the  average  expected  matches  over  the  long  run  of  playing  this  game  for  the  player. 

f .  Calculate  the  average  expected  earnings  over  the  long  run  of  playing  this  game  for  the  player. 

g.  Determine  who  has  the  advantage,  the  player  or  the  house. 

Exercise  4.15.15  (Solution  on  p.  216.) 

According  to  the  South  Carolina  Department  of  Mental  Health  web  site,  for 
every  200  U.S.  women,  the  average  number  who  suffer  from  anorexia  is  one 
(http://www.state.sc.us/dinh/anorexia/statistics.htm^^  ).  Out  of  a  randomly  chosen  group  of 
600  U.S.  women: 

d.  How  many  are  expected  to  suffer  from  anorexia? 

e.  Find  the  probability  that  no  one  suffers  from  anorexia. 

'http:/ / www.state.sc.us/dmh/anorexia/statistics.htm 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


196 


CHAPTER  4.  DISCRETE  RANDOM  VARIABLES 


f .  Find  the  probability  that  more  than  four  suffer  from  anorexia. 
Exercise  4.15.16 

The  average  number  of  children  a  Japanese  woman  has  in  her  lifetime  is  1.37.  Suppose  that  one 
Japanese  woman  is  randomly  chosen. 

(      http:/ / www.mhlw.go.ip/english/policy /duldren/ciiildren-childieaxm^ 
MHLW's  Pamphlet) 

d.  Find  the  probability  that  she  has  no  children. 

e.  Find  the  probability  that  she  has  fewer  children  than  the  Japanese  average. 

f .  Find  the  probability  that  she  has  more  children  than  the  Japanese  average. 

Exercise  4.15.17  (Solution  on  p.  216.) 

The  average  number  of  children  a  Spanish  woman  has  in  her  life- 
time is  1.47.  Suppose  that  one  Spanish  woman  is  randomly  chosen. 
(http:/ / www. typicallyspanish.com/news/publish/ article_4897.shtml ). 

d.  Find  the  probability  that  she  has  no  children. 

e.  Find  the  probability  that  she  has  fewer  children  than  the  Spanish  average. 

f .  Find  the  probability  that  she  has  more  children  than  the  Spanish  average  . 

Exercise  4.15.18 

Fertile  (female)  cats  produce  an  average  of  3  litters  per  year.  (Source;  The  Humane  Society  of 
the  United  States).  Suppose  that  one  fertile,  female  cat  is  randomly  chosen.  In  one  year,  find  the 
probability  she  produces: 

d.  No  litters. 

e.  At  least  2  litters. 

f.  Exactly  3  litters. 

Exercise  4.15.19  (Solution  on  p.  216.) 

A  consumer  looking  to  buy  a  used  red  Miata  car  will  call  dealerships  until  she  finds  a  dealership 
that  carries  the  car.  She  estimates  the  probability  that  any  independent  dealership  will  have  the 
car  will  be  28%.  We  are  interested  in  the  number  of  dealerships  she  must  call. 

d.  On  average,  how  many  dealerships  would  we  expect  her  to  have  to  call  until  she  finds  one  that 

has  the  car? 

e.  Find  the  probability  that  she  must  call  at  most  4  dealerships. 

f.  Find  the  probability  that  she  must  call  3  or  4  dealerships. 

Exercise  4.15.20 

Suppose  that  the  probability  that  an  adult  in  America  will  watch  the  Super  Bowl  is  40%.  Each 
person  is  considered  independent.  We  are  interested  in  the  niraiber  of  adults  in  America  we  must 
survey  until  we  find  one  who  wiU  watch  the  Super  Bowl. 

d.  How  many  adults  in  America  do  you  expect  to  survey  un.til  you  find  one  who  wiU  watch  the 

Super  Bowl? 

e.  Find  the  probability  that  you  must  ask  7  people. 

f .  Find  the  probability  that  you  must  ask  3  or  4  people. 

^''http:  /  /  www.mhlw.go.jp/ english/ policy/children/ children-childrearing/ index.html 
^*http:  /  /  www.t5^icallyspaiiish.com/news/ publish/ article_4897.shtinl 
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Exercise  4.15.21  (Solution  on  p.  216.) 

A  group  of  Martial  Arts  students  is  planning  on  participating  in  an  upcoming  demonstration. 
6  are  students  of  Tae  Kwon  Do;  7  are  students  of  Shotokan  Karate.  Suppose  that  8  students  are 
randomly  picked  to  be  in  the  first  demonstration.  We  are  interested  in  the  number  of  Shotokan 
Karate  students  in  that  first  demonstration.  Hint:  Use  the  Hypergeometric  distribution.  Look  in 
the  Formijlas  section  of  4:  Discrete  Distributions  and  in  the  Appendix  Formulas. 

d.  How  many  Shotokan  Karate  students  do  we  expect  to  be  in  that  first  demonstration? 

e.  Find  the  probability  that  4  students  of  Shotokan  Karate  are  picked  for  the  first  demonstration. 

f .  Suppose  that  we  are  interested  in  the  Tae  Kwan  Do  students  that  are  picked  for  the  first  demon- 

stiation.  Find  the  probability  that  all  6  students  of  Tae  Kwan  Do  are  picked  for  the  first 
demonstiation. 

Exercise  4.15.22 

The  chance  of  a  IRS  audit  for  a  tax  return  with  over  $25,000  in  income  is  about  2%  per  year.  We 
are  interested  in  the  expected  number  of  audits  a  person  with  that  income  has  in  a  20  year  period. 
Assume  each  year  is  independent. 

d.  How  many  audits  are  expected  in  a  20  year  period? 

e.  Find  the  probability  that  a  person  is  not  audited  at  all. 

f .  Find  the  probability  that  a  person  is  audited  more  than  twice. 

Exercise  4.15.23  (Solution  on  p.  216.) 

Refer  to  the  previous  problem.  Suppose  that  100  people  with  tax  returns  over  $25,000  are  ran- 
domly picked.  We  are  interested  in  the  number  of  people  audited  in  1  year.  One  way  to  solve  this 
problem  is  by  using  the  Binomial  Distribution.  Since  n  is  large  and  p  is  small,  another  discrete 
distribution  could  be  used  to  solve  the  following  problems.  Solve  the  following  questions  (d-f) 
using  that  distribution. 

d.  How  many  are  expected  to  be  audited? 

e.  Find  the  probability  that  no  one  was  audited. 

f .  Find  the  probability  that  more  than  2  were  audited. 

Exercise  4.15.24 

Suppose  that  a  technology  task  force  is  being  formed  to  study  technology  awareness  among  in- 
structors. Assume  that  10  people  will  be  randomly  chosen  to  be  on  the  committee  from  a  group 
of  28  volunteers,  20  who  are  technically  proficient  and  8  who  are  not.  We  are  interested  in  the 
number  on  the  committee  who  are  not  technically  proficient. 

d.  How  many  instructors  do  you  expect  on  the  committee  who  are  not  technically  proficient? 

e.  Find  the  probability  that  at  least  5  on  the  committee  are  not  technically  proficient. 

f .  Find  the  probability  that  at  most  3  on  the  committee  are  not  technically  proficient. 

Exercise  4.15.25  (Solution  on  p.  217.) 

Refer  back  to  Exercise  4.15.12.  Solve  this  problem  again,  using  a  different,  though  still  acceptable, 
distiibution. 

Exercise  4.15.26 

Suppose  that  9  Massachusetts  athletes  are  scheduled  to  appear  at  a  charity  benefit.  The  9  are  ran- 
domly chosen  from  8  volunteers  from  the  Boston  Celtics  and  4  volunteers  from  the  New  England 
Patriots.  We  are  interested  in  the  number  of  Patiiots  picked. 

d.  Is  it  more  Ukely  that  there  will  be  2  Patriots  or  3  Patiiots  picked? 
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e.  What  is  the  probability  that  all  of  the  volunteers  will  be  from  the  Celtics 

f .  Is  it  more  likely  that  more  of  the  volimteers  wiU  be  from  the  Patriots  or  from  the  Celtics?  How 

do  you  know? 

Exercise  4.15.27  (Solution  on  p.  217.) 

On  average,  Pierre,  an  amateur  chef,  drops  3  pieces  of  egg  shell  into  every  2  batters  of  cake  he 
makes.  Suppose  that  you  buy  one  of  his  cakes. 

d.  On  average,  how  many  pieces  of  egg  shell  do  you  expect  to  be  in  the  cake? 

e.  What  is  the  probability  that  there  will  not  be  any  pieces  of  egg  shell  in  the  cake? 

f .  Let's  say  that  you  buy  one  of  Pierre's  cakes  each  week  for  6  weeks.  What  is  the  probability  that 

there  will  not  be  any  egg  shell  in  any  of  the  cakes? 

g.  Based  upon  the  average  given  for  Pierre,  is  it  possible  for  there  to  be  7  pieces  of  shell  in  the 

cake?  Why? 

Exercise  4.15.28 

It  has  been  estimated  that  only  about  30%  of  California  residents  have  adequate  earthquake  sup- 
plies. Suppose  we  are  interested  in  the  number  of  California  residents  we  must  survey  until  we 
find  a  resident  who  does  not  have  adequate  earthquake  supplies. 

d.  What  is  the  probability  that  we  must  survey  just  1  or  2  residents  until  we  find  a  California 

resident  who  does  not  have  adequate  earthquake  supplies? 

e.  What  is  the  probability  that  we  must  survey  at  least  3  California  residents  until  we  find  a  Cali- 

fornia resident  who  does  not  have  adequate  earthquake  supplies? 

f.  How  many  California  residents  do  you  expect  to  need  to  survey  until  you  find  a  California 

resident  who  does  not  have  adequate  earthquake  supplies? 

g.  How  many  California  residents  do  you  expect  to  need  to  siurvey  imtil  you  find  a  California 

resident  who  does  have  adequate  earthquake  supplies? 

Exercise  4.15.29  (Solution  on  p.  217.) 

Refer  to  the  above  problem.  Suppose  you  randomly  survey  11  California  residents.  We  are  inter- 
ested in  the  number  who  have  adequate  earthquake  supplies. 

d.  What  is  the  probability  that  at  least  8  have  adequate  earthquake  supplies? 

e.  Is  it  more  likely  that  none  or  that  all  of  the  residents  surveyed  wiU  have  adequate  earthquake 

supplies?  Why? 

f .  How  many  residents  do  you  expect  will  have  adequate  earthquake  supplies? 

The  next  2  questions  refer  to  the  following:  In  one  of  its  Spring  catalogs,  L.L.  Bean®  advertised  footwear  on 
29  of  its  192  catalog  pages. 

Exercise  4.15.30 

Suppose  we  randomly  survey  20  pages.  We  are  interested  in  the  number  of  pages  that  advertise 
footwear.  Each  page  may  be  picked  at  most  once. 

d.  How  many  pages  do  you  expect  to  advertise  footwear  on  them? 

e.  Is  it  probable  that  all  20  will  advertise  footwear  on  them?  Why  or  why  not? 

f.  What  is  the  probability  that  less  than  10  wiU  advertise  footwear  on  them? 

Exercise  4.15.31  (Solution  on  p.  217.) 

Suppose  we  randomly  survey  20  pages.  We  are  interested  in  the  number  of  pages  that  advertise 
footwear.  This  time,  each  page  may  be  picked  more  than  once. 
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d.  How  many  pages  do  you  expect  to  advertise  footwear  on  them? 

e.  Is  it  probable  that  all  20  will  advertise  footwear  on  them?  Why  or  why  not? 

f.  What  is  the  probability  that  less  than  10  will  advertise  footwear  on  them? 

g.  Reminder:  A  page  may  be  picked  more  than  once.  We  are  interested  in  the  number  of  pages 

that  we  must  randomly  survey  until  we  find  one  that  has  footwear  advertised  on  it.  Define 
the  random  variable  X  and  give  its  distribution. 

h.  What  is  the  probability  that  you  only  need  to  survey  at  most  3  pages  in  order  to  find  one  that 

advertises  footwear  on  it? 

i.  How  many  pages  do  you  expect  to  need  to  survey  in  order  to  find  one  that  advertises  footwear? 

Exercise  4.15.32 

Suppose  that  you  roll  a  fair  die  until  each  face  has  appeared  at  least  once.  It  does  not  matter  in 
what  order  the  numbers  appear.  Find  the  expected  number  of  rolls  you  must  make  until  each  face 
has  appeared  at  least  once. 


4.15.2  Try  these  multiple  choice  problems. 

For  the  next  three  problems:  The  probability  that  the  San  Jose  Sharks  will  win  any  given  game  is  0.3694 
based  on  a  13  year  win  history  of  382  wins  out  of  1034  games  played  (as  of  a  certain  date).  An  upcoming 
monthly  schedule  contains  12  games. 
Let  X  =  the  number  of  games  won  in  that  upcoming  month. 

Exercise  4.15.33  (Solution  on  p.  217.) 

The  expected  number  of  wins  for  that  upcoming  month  is: 

A.  1.67 

B.  12 

r  382_ 
1043 
D.  4.43 

Exercise  4.15.34  (Solution  on  p.  217.) 

What  is  the  probability  that  the  San  Jose  Sharks  win  6  games  in  that  upcoming  month? 

A.  0.1476 

B.  0.2336 

C.  0.7664 

D.  0.8903 

Exercise  4.15.35  (Solution  on  p.  217.) 

What  is  the  probability  that  the  San  Jose  Sharks  win  at  least  5  games  in  that  upcoming  month 

A.  0.3694 

B.  0.5266 

C.  0.4734 

D.  0.2305 

For  the  next  two  questions:  The  average  number  of  times  per  week  that  Mrs.  Plum's  cats  wake  her  up  at 
night  because  they  want  to  play  is  10.  We  are  interested  in  the  number  of  times  her  cats  wake  her  up  each 
week. 

Exercise  4.15.36  (Solution  on  p.  217.) 

In  words,  the  random  variable  X  = 
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A.  The  number  of  times  Mrs.  Plum's  cats  wake  her  up  each  week 

B.  The  number  of  times  Mrs.  Plum's  cats  wake  her  up  each  hour 

C.  The  number  of  times  Mrs.  Plum's  cats  wake  her  up  each  night 

D.  The  number  of  times  Mrs.  Plum's  cats  wake  her  up 

Exercise  4.15.37  (Solution  on  p.  217.) 

Find  the  probability  that  her  cats  will  wake  her  up  no  more  than  5  times  next  week. 

A.  0.5000 

B.  0.9329 

C.  0.0378 

D.  0.0671 


Exercise  4.15.38  (Solution  on  p.  217.) 

People  visiting  video  rental  stores  often  rent  more  than  one  DVD  at  a  time.  The  probability 
distribution  for  DVD  rentals  per  customer  at  Video  To  Go  is  given  below.  There  is  5  video  limit 
per  customer  at  this  store,  so  nobody  ever  rents  more  than  5  DVDs. 


X 

0 

1 

2 

3 

4 

5 

P(X=x) 

0.03 

0.50 

0.24 

? 

0.07 

0.04 

Table  4.17 


A.  Describe  the  random  variable  X  in  words. 

B.  Find  the  probability  that  a  customer  rents  three  DVDs. 

C.  Find  the  probability  that  a  customer  rents  at  least  4  DVDs. 

D.  Find  the  probability  that  a  customer  rents  at  most  2  DVDs. 

Another  shop.  Entertainment  Headquarters,  rents  DVDs  and  videogames.  The  probability  distri- 
bution for  DVD  rentals  per  customer  at  this  shop  is  given  below.  They  also  have  a  5  DVD  limit  per 
customer. 


X 

0 

1 

2 

3 

4 

5 

P(X=x) 

0.35 

0.25 

0.20 

0.10 

0.05 

0.05 

Table  4.18 


E.  At  which  store  is  the  expected  number  of  DVDs  rented  per  customer  higher? 

F.  If  Video  to  Go  estimates  that  they  will  have  300  customers  next  week,  how  many  DVDs  do  they 

expect  to  rent  next  week?  Answer  in  sentence  form. 

G.  If  Video  to  Go  expects  300  customers  next  week  and  Entertainment  HQ  projects  that  they  will 

have  420  customers,  for  which  store  is  the  expected  number  of  DVD  rentals  for  next  week 
higher?  Explain. 

H.  Which  of  the  two  video  stores  experiences  more  variation  in  the  number  of  DVD  rentals  per 

customer?  How  do  you  know  that? 

Exercise  4.15.39  (Solution  on  p.  217.) 

A  game  involves  selecting  a  card  from  a  deck  of  cards  and  tossing  a  coin.  The  deck  has  52  cards 
and  12  cards  are  "face  cards"  (Jack,  Queen,  or  King)  The  coin  is  a  fair  coin  and  is  equally  likely  to 
land  on  Heads  or  Tails 
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•  If  the  card  is  a  face  card  and  the  coin  lands  on  Heads,  you  win  $6 

•  If  the  card  is  a  face  card  and  the  coin  lands  on  Tails,  you  win  $2 

•  If  the  card  is  not  a  face  card,  you  lose  $2,  no  matter  what  the  coin  shows. 


A.  Find  the  expected  value  for  this  game  (expected  net  gain  or  loss). 

B.  Explain  what  your  calculations  indicate  about  your  long-term  average  profits  and  losses  on 

this  game. 

C.  Should  you  play  this  game  to  win  money? 

Exercise  4.15.40  (Solution  on  p.  218.) 

You  buy  a  lottery  ticket  to  a  lottery  that  costs  $10  per  ticket.  There  are  only  100  tickets  available 
be  sold  in  this  lottery.  In  this  lottery  there  is  one  $500  prize,  2  $100  prizes  and  4  $25  prizes.  Find 
your  expected  gain  or  loss. 

Exercise  4.15.41  (Solution  on  p.  218.) 

A  student  takes  a  10  question  true-false  quiz,  but  did  not  study  and  randomly  guesses  each  an- 
swer. Find  the  probability  that  the  student  passes  the  quiz  with  a  grade  of  at  least  70%  of  the 

questions  correct. 

Exercise  4.15.42  (Solution  on  p.  218.) 

A  student  takes  a  32  question  multiple  choice  exam,  but  did  not  study  and  randomly  guesses  each 
answer.  Each  question  has  3  possible  choices  for  the  answer.  Find  the  probability  that  the  student 
guesses  more  than  75%  of  the  questions  correctly. 

Exercise  4.15.43  (Solution  on  p.  219.) 

Suppose  that  you  are  perfoming  the  probability  experiment  of  rolling  one  fair  six-sided  die.  Let  F 
be  the  event  of  rolling  a  "4"  or  a  "5".  You  are  interested  in  how  many  times  you  need  to  roU  the  die 
in  order  to  obtain  the  first  "4  or  5"  as  the  outcome. 

•  p  =  probability  of  success  (event  F  occurs) 

•  q  =  probability  of  failure  (event  F  does  not  occur) 


A.  Write  the  description  of  the  random  variable  X.  What  are  the  values  that  X  can  take  on?  Find 

the  values  of  p  and  q. 

B.  Find  the  probability  that  the  first  occurrence  of  event  F  (rolling  a  "4"  or  "5")  is  on  the  second 

trial. 

C.  How  many  trials  would  you  expect  until  you  roll  a  "4"  or  "5"? 
Exercises  38  -  43  contributed  by  Roberta  Bloom 
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CHAPTER  4.  DISCRETE  RANDOM  VARIABLES 


4.16  Review^' 

The  next  two  questions  refer  to  the  following: 

A  recent  poll  concerning  credit  cards  found  that  35  percent  of  respondents  use  a  credit  card  that  gives  them 
a  mile  of  air  travel  for  every  dollar  they  charge.  Thirty  percent  of  the  respondents  charge  more  than  $2000 
per  month.  Of  those  respondents  who  charge  more  than  $2000, 80  percent  use  a  credit  card  that  gives  them 
a  mile  of  air  travel  for  every  dollar  they  charge. 

Exercise  4.16.1  (Solution  on  p.  219.) 

What  is  the  probability  that  a  randomly  selected  respondent  will  spend  more  than  $2000  AND 
use  a  credit  card  that  gives  them  a  mile  of  air  travel  for  every  dollar  they  charge? 

A.  (0.30)  (0.35) 

B.  (0.80)  (0.35) 

C.  (0.80)  (0.30) 

D.  (0.80) 

Exercise  4.16.2  (Solution  on  p.  219.) 

Based  upon  the  above  information,  are  using  a  credit  card  that  gives  a  mile  of  air  travel  for  each 
dollar  spent  AND  charging  more  than  $2000  per  month  independent  events? 

A.  Yes 

B.  No,  and  they  are  not  mutually  exclusive  either 

C.  No,  but  they  are  mutually  exclusive 

D.  Not  enough  information  given  to  determine  the  answer 

Exercise  4.16.3  (Solution  on  p.  219.) 

A  sociologist  wants  to  know  the  opinions  of  employed  adult  women  about  government  funding 
for  day  care.  She  obtains  a  list  of  520  members  of  a  local  business  and  professional  women's 
club  and  mails  a  questionnaire  to  100  of  these  women  selected  at  random.  68  questionnaires  are 
returned.  What  is  the  population  in  this  study? 

A.  All  employed  adult  women 

B.  All  the  members  of  a  local  business  and  professional  women's  club 

C.  The  100  women  who  received  the  questionnaire 

D.  All  employed  women  with  children 

The  next  two  questions  refer  to  the  following:  An  article  from  The  San  Jose  Mercury  News  was  concerned 
with  the  racial  mix  of  the  1500  students  at  Prospect  High  School  in  Saratoga,  CA.  The  table  summarizes  the 
resijlts.  (Male  and  female  values  are  approximate.)  Suppose  one  Prospect  High  School  student  is  randomly 
selected. 


Ethnic  Group 

Gender 

White 

Asian 

Hispanic 

Black 

American  Indian 

Male 

400 

168 

115 

35 

16 

Female 

440 

132 

140 

40 

14 

Table  4.19 


^^This  content  is  available  online  at  <http://cnx.org/content/ml6832/l.ll/>. 
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Exercise  4.16.4  (Solution  on  p.  219.) 

Find  the  probability  that  a  student  is  Asian  or  Male. 

Exercise  4.16.5  (Solution  on  p.  219.) 

Find  the  probability  that  a  student  is  Black  given  that  the  student  is  Female. 

Exercise  4.16.6  (Solution  on  p.  219.) 

A  sample  of  pounds  lost,  in  a  certain  month,  by  individual  members  of  a  weight  reducing  clinic 
produced  the  following  statistics: 

•  Mean  =  5  lbs. 

•  Median  =  4.5  lbs. 

•  Mode  =  4  lbs. 

•  Standard  deviation  =  3.8  lbs. 

•  First  quartile  =  2  lbs. 

•  Third  quartile  -  8.5  lbs. 

The  correct  statement  is: 


A.  One  fourth  of  the  members  lost  exactly  2  pounds. 

B.  The  middle  fifty  percent  of  the  members  lost  from  2  to  8.5  lbs. 

C.  Most  people  lost  3.5  to  4.5  lbs. 

D.  AU  of  the  choices  above  are  correct. 


Exercise  4.16.7  (Solution  on  p.  219.) 

What  does  it  mean  when  a  data  set  has  a  standard  deviation  equal  to  zero? 

A.  All  values  of  the  data  appear  with  the  same  frequency. 

B.  The  mean  of  the  data  is  also  zero. 

C.  All  of  the  data  have  the  same  value. 

D.  There  are  no  data  to  begin  with. 

Exercise  4.16.8  (Solution  on  p.  219.) 

The  statement  that  best  describes  the  Ulustration  below  is: 


I 
I 
I 


Figure  4.1 


A.  The  mean  is  equal  to  the  median. 

B.  There  is  no  first  quartile. 

C.  The  lowest  data  value  is  the  median. 

D.  The  median  equals  (Q-^+Q^) 
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Exercise  4.16.9  (Solution  on  p.  219.) 

According  to  a  recent  article  (San  Jose  Mercury  News)  the  average  number  of  babies  bom  with 

significant  hearing  loss  (deafness)  is  approximately  2  per  1000  babies  in  a  healthy  baby  nursery. 
The  number  climbs  to  an  average  of  30  per  1000  babies  in  an  intensive  care  nursery 

Suppose  that  1000  babies  from  healthy  baby  nurseries  were  randomly  surveyed.  Find  the  proba- 
bility that  exactly  2  babies  were  born  deaf. 

Exercise  4.16.10  (Solution  on  p.  219.) 

A  "friend"  offers  you  the  following  "deal."  For  a  $10  fee,  you  may  pick  an  envelope  from  a  box 
containing  100  seemingly  identical  envelopes.  However,  each  envelope  contains  a  coupon  for  a 
free  gift. 

•  10  of  the  coupons  are  for  a  free  gift  worth  $6. 

•  80  of  the  coupons  are  for  a  free  gift  worth  $8. 

•  6  of  the  coupons  are  for  a  free  gift  worth  $12. 

•  4  of  the  coupons  are  for  a  free  gift  worth  $40. 

Based  upon  the  financial  gain  or  loss  over  the  long  nm,  shoiild  you  play  the  game? 

A.  Yes,  1  expect  to  come  out  ahead  in  money. 

B.  No,  1  expect  to  come  out  behind  in  money. 

C.  It  doesn't  matter.  I  expect  to  break  even. 


The  next  four  questions  refer  to  the  following:  Recently,  a  nurse  commented  that  when  a  patient  calls  the 
medical  advice  line  claiming  to  have  the  flu,  the  chance  that  he/she  truly  has  the  flu  (and  not  just  a  nasty 
cold)  is  only  about  4%.  Of  the  next  25  patients  calling  in  claiming  to  have  the  flu,  we  are  interested  in  how 
many  actually  have  the  flu. 

Exercise  4.16.11  (Solution  on  p.  219.) 

Define  the  Random  Variable  and  list  its  possible  values. 

Exercise  4.16.12  (Solution  on  p.  219.) 

State  the  distribution  of  X . 

Exercise  4.16.13  (Solution  on  p.  219.) 

Find  the  probability  that  at  least  4  of  the  25  patients  actually  have  the  flu. 

Exercise  4.16.14  (Solution  on  p.  219.) 

On  average,  for  every  25  patients  calling  in,  how  many  do  you  expect  to  have  the  flu? 

The  next  two  questions  refer  to  the  following:  Different  types  of  writing  can  sometimes  be  distinguished 
by  the  number  of  letters  in  the  words  used.  A  student  interested  in  this  fact  wants  to  study  the  number  of 
letters  of  words  used  by  Tom  Clancy  in  his  novels.  She  opens  a  Clancy  novel  at  random  and  records  the 
number  of  letters  of  the  first  250  words  on  the  page. 

Exercise  4.16.15  (Solution  on  p.  219.) 

What  kind  of  data  was  collected? 


A.  qualitative 

B.  quantitative  -  continuous 

C.  quantitative  -  discrete 

Exercise  4.16.16  (Solution  on  p.  219.) 

What  is  the  population  under  study? 
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4.17  Lab  1:  Discrete  Distribution  (Playing  Card  Experiment)^" 

Class  Time: 
Names: 

4.17.1  Student  Learning  Outcomes: 

•  The  student  will  compare  empirical  data  and  a  theoretical  distribution  to  determine  if  everyday  ex- 
periment fits  a  discrete  distribution. 

•  The  student  wiU  demonstrate  an  imderstanding  of  long-term  probabilities. 

4.17.2  Supplies: 

•  One  full  deck  of  playing  cards 

4.17.3  Procedure 

The  experiment  procedure  is  to  pick  one  card  from  a  deck  of  shuffled  cards. 

1.  The  theorectical  probability  of  picking  a  diamond  from  a  deck  is:  

2.  Shuffle  a  deck  of  cards. 

3.  Pick  one  card  from  it. 

4.  Record  whether  it  was  a  diamond  or  not  a  diamond. 

5.  Put  the  card  back  and  reshuffle. 

6.  Do  this  a  total  of  10  times 

7.  Record  the  number  of  diamonds  picked. 

8.  Let  X  =  number  of  diamonds.  Theoretically,  X  ~  B  (  ,  ) 

4.17.4  Organize  the  Data 

1.  Record  the  number  of  diamonds  picked  for  your  class  in  the  chart  below.  Then  calculate  the  relative 
frequency. 

^'^This  content  is  available  onKne  at  <http://cnx.org/content/ml6827/1.12/>. 
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X 

Frequency 

Relative  Frequency 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Table  4.20 

2.  Calculate  the  followmg: 

a.  X  = 

b.  s  = 

3.  Construct  a  histogram  of  the  empirical  data. 


Relative 
Frequency 


Nimi'ber  of 
Diamonds 


Figure  4.2 
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4.17.5  Theoretical  Distribution 

1.  Build  the  theoretical  PDF  chart  based  on  the  distribution  in  the  Procedure  section  above. 


X 

P{x) 

0 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

Table  4.21 

2.  Calculate  the  following: 

a.  }i  =  

b.  a=  

3.  Construct  a  histogram  of  the  theoretical  distribution. 


Probability' 


Number  of 
Diamonds 


Figure  4.3 
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4.17.6  Using  the  Data 

Calculate  the  following,  roiinding  to  4  decimal  places: 

NOTE:  RF  =  relative  frequency 
Use  the  table  from  the  section  titled  "Theoretical  Distribution"  here: 

•  P(l<  J  <4)  = 

•  P{x>8)  = 

Use  the  data  from  the  section  titled  "Organize  the  Data"  here: 

•  RF{x  =  3)  = 

•  RF  (1<  X  <  4)  = 

•  RF  (x  >  8)  = 

4.17.7  Discussion  Questions 

For  questions  1.  and  2.,  think  about  the  shapes  of  the  two  graphs,  the  probabilities  and  the  relative  frequen- 
cies, the  means,  and  the  standard  deviations. 

1.  Knowing  that  data  vary,  describe  three  similarities  between  the  graphs  and  distributions  of  the  theo- 
retical and  empirical  distributions.  Use  complete  sentences.  (Note:  These  answers  may  vary  and  still 
be  correct.) 

2.  Describe  the  three  most  significant  differences  between  the  graphs  or  distributions  of  the  theoretical 
and  empirical  distributions.  (Note:  These  answers  may  vary  and  still  be  correct.) 

3.  Using  your  answers  from  the  two  previous  questions,  does  it  appear  that  the  data  fit  the  theoretical 
distribution?  In  1  -  3  complete  sentences,  explain  why  or  why  not. 

4.  Suppose  that  the  experiment  had  been  repeated  500  times.  Which  table  (from  "Organize  the  data" 
and  "Theoretical  Distributions")  would  you  expect  to  change  (and  how  would  it  change)?  Why?  Why 
woijldn't  the  other  table  change? 
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4.18  Lab  2:  Discrete  Distribution  (Lucky  Dice  Experiment)^^ 

Class  Time: 
Names: 

4.18.1  Student  Learning  Outcomes: 

•  The  student  will  compare  empirical  data  and  a  theoretical  distribution  to  determine  if  a  Tet  gambling 
game  fits  a  discrete  distribution. 

•  The  student  wiU  demonstrate  an  understanding  of  long-term  probabilities. 


4.18.2  Supplies: 

•  1  game  "Lucky  Dice"  or  3  regular  dice 

NOTE:  For  a  detailed  game  description,  refer  here.  (The  link  goes  to  the  beginning  of  Discrete 
Random  Variables  Homework.  Please  refer  to  Problem  #14.) 

NOTE:  Round  relative  frequencies  and  probabilities  to  four  decimal  places. 


4.18.3  The  Procedure 

1.  The  experiment  procedure  is  to  bet  on  one  object.  Then,  roU  3  Lucky  Dice  and  cormt  the  number  of 
matches.  The  number  of  matches  will  decide  your  profit. 

2.  What  is  the  theoretical  probability  of  1  die  matching  the  object?  

3.  Choose  one  object  to  place  a  bet  on.  Roll  the  3  Lucky  Dice.  Count  the  number  of  matches. 

4.  Let  X  -  number  of  matches.  Theoretically,  X  ^  B  (  ,  ) 

5.  Let  Y  =  profit  per  game. 

4.18.4  Organize  the  Data 

In  the  chart  below,  fill  in  the  Y  value  that  corresponds  to  each  X  value.  Next,  record  the  number  of  matches 
picked  for  your  class.  Then,  calculate  the  relative  frequency. 

1.  Complete  the  table. 


X 

y 

Frequency 

Relative  Frequency 

0 

1 

2 

3 

Table  4.22 

This  content  is  available  online  at  <http://cnx.org/content/ml6826/1.12/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


210 


CHAPTER  4.  DISCRETE  RANDOM  VARIABLES 


2.  Calculate  the  Following: 

a.  X  = 

b.  Sx  = 

c.  y  = 

d.  Sy  = 

3.  Explain  what  x  represents. 

4.  Explain  what  y  represents. 

5.  Based  upon  the  experiment: 

a.  What  was  the  average  profit  per  game? 

b.  Did  this  represent  an  average  win  or  loss  per  game? 

c.  How  do  you  know?  Answer  in  complete  sentences. 

6.  Construct  a  histogram  of  the  empirical  data 


Relative  Frequency 


Number  of  Matches 


Figure  4.4 


4.18.5  Theoretical  Distribution 

Build  the  theoretical  PDF  chart  for  X  and  Y  based  on  the  distribution  from  the  section  titled  "The  Procedure". 
1. 


X 

y 

P{x)  =  P{y) 

0 

1 

2 

3 
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Table  4.23 

2.  Calculate  the  following 

a.  }ix  = 

b.  (7x  = 

C.    ]ly  = 

3.  Explain  what  }ix  represents. 

4.  Explain  what  \iy  represents. 

5.  Based  upon  theory: 

a.  What  was  the  expected  profit  per  game? 

b.  Did  the  expected  profit  represent  an  average  win  or  loss  per  game? 

c.  How  do  you  know?  Answer  in  complete  sentences. 

6.  Construct  a  histogram  of  the  theoretical  distribution. 


Probability 


Number  of  Matches 


Figure  4.5 


4.18.6  Use  the  Data 

Calculate  the  following  (rounded  to  4  decimal  places): 

NOTE:  RF  =  relative  frequency 
Use  the  data  from  the  section  titled  "Theoretical  Distribution"  here: 

1.  P  (x  =  3)  =  

2.  P  (0  <  X  <  3)  =  

3.  P  (x  >  2)  =  
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Use  the  data  from  the  section  titled  "Organize  the  Data"  here: 

1.  RF  (x  =  3)  =  

2.  RF  (0  <  X  <  3)  =  

3.  RF  (x  >  2)  =  

4.18.7  Discussion  Question 

For  questions  1.  and  2.,  consider  the  graphs,  the  probabilities  and  relative  frequencies,  the  means  and  the 
standard  deviations. 

1.  Knowing  that  data  vary,  describe  three  similarities  between  the  graphs  and  disfributions  of  the  theo- 
retical and  empirical  distributions.  Use  complete  sentences.  (Note:  these  answers  may  vary  and  stni 
be  correct.) 

2.  Describe  the  three  most  significant  differences  between  the  graphs  or  distributions  of  the  theoretical 
and  empirical  distributions.  (Note:  these  answers  may  vary  and  still  be  correct.) 

3.  Thinking  about  your  answers  to  1.  and  2.,does  it  appear  that  the  data  fit  the  theoretical  disfribution? 
In  1  -  3  complete  sentences,  explain  why  or  why  not. 

4.  Suppose  that  the  experiment  had  been  repeated  500  times.  Which  table  (from  "Organize  the  Data"  or 
"Theoretical  Distribution")  would  you  expect  to  change?  Why?  How  might  the  table  change? 
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Solutions  to  Exercises  in  Chapter  4 

Solution  to  Example  4.2,  Problem  1  (p.  169) 

Let  X  =  the  number  of  days  Nancy  attends  class  per  week. 

Solution  to  Example  4.2,  Problem  2  (p.  169) 

0, 1,  2,  and  3 

Solution  to  Example  4.2,  Problem  3  (p.  169) 


X 

P{x) 

0 

0.01 

1 

0.04 

2 

0.15 

3 

0.80 

Table  4.24 


Solution  to  Example  4.5,  Problem  1  (p.  171) 

X  =  amount  of  profit 

Solution  to  Example  4.5,  Problem  2  (p.  171) 


X 

P{x) 

xP{x) 

WIN 

10 

1 

3 

10 

3 

LOSE 

-6 

2 
3 

-12 
3 

Table  4.25 


Solution  to  Example  4.5,  Problem  3  (p.  172) 

Add  the  last  column  of  the  table.  The  expected  value  ji  —        You  lose,  on  average,  about  67  cents  each 
time  you  play  the  game  so  you  do  not  come  out  ahead. 
Solution  to  Example  4.9,  Problem  1  (p.  173) 

failure 

Solution  to  Example  4.9,  Problem  2  (p.  173) 

X  =  the  number  of  statistics  students  who  do  their  homework  on  time 
Solution  to  Example  4.9,  Problem  3  (p.  173) 
0, 1,2,. ..,50 

Solution  to  Example  4.9,  Problem  4  (p.  173) 

Failure  is  a  student  who  does  not  do  his  or  her  homework  on  time. 
Solution  to  Example  4.9,  Problem  5  (p.  173) 
q  =  0.30 

Solution  to  Example  4.9,  Problem  6  (p.  174) 

greater  than  or  equal  to  (>) 

Solution  to  Example  4.14,  Problem  2  (p.  176) 

1, 2,  3, . . .,  (total  number  of  students) 
Solution  to  Example  4.14,  Problem  3  (p.  176) 

•  p  =  0.55 

•  ^  =  0.45 
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Solution  to  Example  4,14,  Problem 

1  ip.  I/O) 

P{x  —  4) 

Solution  to  Example  4.18,  Problem 

1  (p.  178) 

Without 

Solution  to  Example  4.18,  Problem 

2  (p.  178) 

Ine  men 

Solution  to  Example  4.18,  Problem 

3  (p.  179) 

15  men 

Solution  to  Example  4.18,  Problem 

4  (p.  179) 

18  women 

Solution  to  Example  4.18,  Problem 

5  tp.  179) 

Let  X  =  the  number  of  men  on  the 

committee,  x  =  0, 1, 2, . . .,  7. 

Solution  to  Example  4.18,  Problem 

6  (p.  179) 

P(x>4) 

Solution  to  Example  4.22,  Problem 

1  (p.  181) 

One  broadcast 

Solution  to  Example  4.22,  Problem 

2 

2  (p.  181) 

Solution  to  Example  4.22,  Problem 

3  (p.  181) 

Let  X  =  the  number  of  times  the  news  reporter  says  "uh"  during  one  broadcast. 

x  =  0,1,2,  3,... 

Solution  to  Example  4.22,  Problem  4  (p.  181) 

P(x  >  2) 

Solutions  to  Practice  2:  Binomial  Distribution 

Solution  to  Exercise  4.11.1  (p.  186) 
X-  the  number  that  reply  "yes" 
Solution  to  Exercise  4.11.2  (p.  186) 

5(8,0.713) 

Solution  to  Exercise  4.11.3  (p.  186) 

0,1,2,3,4,5,6,7,8 

Solution  to  Exercise  4.11.5  (p.  186) 

5.7 

Solution  to  Exercise  4.11.6  (p.  186) 

1.28 

Solution  to  Exercise  4.11.7  (p.  187) 
0.4151 

Solution  to  Exercise  4.11.8  (p.  187) 

0.9990 


Solutions  to  Practice  3:  Poisson  Distribution 

Solution  to  Exercise  4.12.2  (p.  188) 

P(8) 

Solution  to  Exercise  4.12.3  (p.  188) 

0,1,2,3,4,... 

Solution  to  Exercise  4.12.5  (p.  188) 

No 

Solution  to  Exercise  4.12.6  (p.  188) 

No 
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Solutions  to  Practice  4:  Geometric  Distribution 

Solution  to  Exercise  4.13.2  (p.  189) 

G(0.713) 

Solution  to  Exercise  4.13.3  (p.  189) 

1,2,... 

Solution  to  Exercise  4.13.5  (p.  189) 

1.4 

Solution  to  Exercise  4.13.6  (p.  189) 

0.9176 

Solutions  to  Practice  5:  Hypergeometric  Distribution 

Solution  to  Exercise  4.14.2  (p.  191) 

H(16,7,9) 

Solution  to  Exercise  4.14.3  (p.  191) 

2,3,4,5,6,7,8,9 

Solution  to  Exercise  4.14.5  (p.  191) 
6.26 

Solutions  to  Homework 
Solution  to  Exercise  4.15.1  (p.  192) 

a.  0.1 

b.  1.6 

Solution  to  Exercise  4.15.3  (p.  192) 

b.  $200,000;$600,000;$400,000 

c.  third  investment 

d.  first  investment 

e.  second  investment 

Solution  to  Exercise  4.15.5  (p.  193) 

a.  0.2 

c.  2.35 

d.  2-3  children 

Solution  to  Exercise  4.15.7  (p.  194) 

a.  X  =  the  number  of  dice  that  show  a  1 

b.  0,1,2,3,4,5,6 

c.  X~b(6,1) 

d.  1 

e.  0.00002 

f.  3  dice 

Solution  to  Exercise  4.15.9  (p.  194) 

a.  X  =  the  number  of  students  that  will  attend  Tet. 

b.  0, 1,  2,  3, 4,  5,  6,  7,  8, 9, 10, 11, 12 

c.  X-B(12,0.18) 

d.  2.16 
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e.  0.9511 

f.  0.3702 

Solution  to  Exercise  4.15.11  (p.  194) 

a.  X  =  the  number  of  fencers  that  do  not  use  foil  as  their  main  weapon 

b.  0, 1, 2,  3,...  25 

c.  X-B(25,0.40) 

d.  10 

e.  0.0442 

f.  Yes 

Solution  to  Exercise  4.15.13  (p.  195) 

a.  X  =  the  number  of  fortune  cookies  that  have  an  extra  fortune 

b.  0, 1, 2,  3,...  144 

c.  X~B(144, 0.03)  or  P(4.32) 

d.  4.32 

e.  0.0124  or  0.0133 

f.  0.6300  or  0.6264 

Solution  to  Exercise  4.15.15  (p.  195) 

a.  X  =  the  number  of  women  that  suffer  from  anorexia 

b.  0, 1,  2,  3,...  600  (can  leave  off  600) 

c.  X-P(3) 

d.  3 

e.  0.0498 

f.  0.1847 

Solution  to  Exercise  4.15.17  (p.  196) 

a.  X  =  the  number  of  children  for  a  Spanish  woman 

b.  0,1,2,3,... 

c.  X~P(1.47) 

d.  0.2299 

e.  0.5679 

f.  0.4321 

Solution  to  Exercise  4.15.19  (p.  196) 

a.  X  =  the  number  of  dealers  she  calls  until  she  finds  one  with  a  used  red  Miata 

b.  1,2,3,... 

c.  X~G(0.28) 

d.  3.57 

e.  0.7313 

f.  0.2497 

Solution  to  Exercise  4.15.21  (p.  197) 

d.  4.31 

e.  0.4079 

f.  0.0163 

Solution  to  Exercise  4.15.23  (p.  197) 
d.  2 
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e.  0.1353 

f.  0.3233 

Solution  to  Exercise  4.15.25  (p.  197) 

a.  X  =  the  number  of  seniors  that  participated  in  after-school  sports  all  4  years  of  high  school 

b.  0, 1,  2,  3,...  60 

c.  X~P(4.8) 

d.  4.8 

e.  Yes 

f.  4 

Solution  to  Exercise  4.15.27  (p.  198) 

a.  X  =  the  number  of  shell  pieces  in  one  cake 

b.  0, 1,  2,  3,... 

c.  X~P(1.5) 

d.  1.5 

e.  0.2231 

f.  0.0001 

g.  Yes 

Solution  to  Exercise  4.15.29  (p.  198) 

d.  0.0043 

e.  none 

f.  3.3 

Solution  to  Exercise  4.15.31  (p.  198) 

d.  3.02 

e.  No 

f.  0.9997 

h.  0.3881 

i.  6.6207  pages 

Solution  to  Exercise  4.15.33  (p.  199) 

D:  4.43 

Solution  to  Exercise  4.15.34  (p.  199) 

A:  0.1476 

Solution  to  Exercise  4.15.35  (p.  199) 

C:  0.4734 

Solution  to  Exercise  4.15.36  (p.  199) 

A:  The  number  of  times  Mrs.  Plum's  cats  wake  her  up  each  week 
Solution  to  Exercise  4.15.37  (p.  200) 

D:  0.0671 

Solution  to  Exercise  4.15.38  (p.  200) 

Partial  Answer: 

A:  X  =  the  number  of  DVDs  a  Video  to  Go  customer  rents 

B:0.12 

C:0.11 

D:  0.77 

Solution  to  Exercise  4.15.39  (p.  200) 

The  variable  of  interest  is  X  =  net  gain  or  loss,  in  dollars 
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The  face  cards  J,  Q,  K  (Jack,  Queen,  King).  There  are(3)(4)  =  12  face  cards  and  52  - 12  =  40  cards  that  are  not 
face  cards. 

We  first  need  to  construct  the  probability  distribution  for  X.  We  use  the  card  and  coin  events  to  determine 
the  probability  for  each  outcome,  but  we  use  the  monetary  value  of  X  to  determine  the  expected  value. 


Card  Event 

$X  net  gain  or  loss 

P(X) 

Face  Card  and  Heads 

6 

(12/52)(l/2)  =  6/52 

Face  Card  and  Tails 

2 

(12/52)(l/2)  =6/52 

(Not  Face  Card)  and  (H  or  T) 

-2 

(40/52)(l)  =  40/52 

Table  4.26 


•  Expected  value  =  (6)(6/52)  +  (2)(6/52)  +  (-2)  (40/52)  =  -32/52 

•  Expected  value  =  -$0.62,  rounded  to  the  nearest  cent 

•  If  you  play  this  game  repeatedly,  over  a  long  number  of  games,  you  would  expect  to  lost  62  cents  per 

game,  on  average. 

•  You  should  not  play  this  game  to  win  money  because  the  expected  value  indicates  an  expected  aver- 
age loss. 

Solution  to  Exercise  4.15.40  (p.  201) 

Start  by  writing  the  probability  distribution.  X  is  net  gain  or  loss  =  prize  (if  any)  less  $10  cost  of  ticket 


X  =  $  net  gain  or  loss 

P(X) 

$500-$10=$490 

1/100 

$100-$10=$90 

2/100 

$25-$10=$15 

4/100 

$0-$10=$-10 

93/100) 

Table  4.27 


Expected  Value  =  (490)(1/100)  +  (90)(2/100)  +  (15)(4/100)  +  (-10)  (93/100)  =  -$2.  There  is  an  expected  loss 
of  $2  per  ticket,  on  average. 
Solution  to  Exercise  4.15.41  (p.  201) 

•  X  =  number  of  questions  answered  correctly 

•  X~B(10,0.5) 

•  We  are  interested  in  AT  LEAST  70%  of  10  questions  correct.  70%  of  10  is  7.  We  want  to  find  the 
probability  that  X  is  greater  than  or  equal  to  7.  The  event  "at  least  7"  is  the  complement  of  "less  than 
or  equal  to  6". 

•  Using  your  calculator's  distribution  menu:  1  -  binomcdf(10,  .5, 6)  gives  0.171875 

•  The  probability  of  getting  at  least  70%  of  the  10  questions  correct  when  randomly  guessing  is  approx- 
imately 0.172 

Solution  to  Exercise  4.15.42  (p.  201) 

•  X  =  number  of  questions  answered  correctly 

•  X~B(32,l/3) 
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•  We  are  interested  in  MORE  THAN  75%  of  32  questions  correct.  75%  of  32  is  24.  We  want  to  find 
P(x>24).  The  event  "more  than  24"  is  the  complement  of  "less  than  or  equal  to  24". 

•  Using  your  calculator's  distribution  menu:  1  -  binomcdf(32, 1/3, 24) 

•  P(x>24)  =  0.00000026761 

•  The  probability  of  getting  more  than  75%  of  the  32  questions  correct  when  randomly  guessing  is  very 
small  and  practically  zero. 

Solution  to  Exercise  4.15.43  (p.  201) 

A:  X  can  take  on  the  values  1, 2, 3, ....  p  =  2/6,  q  =  4/6 

B:  0.2222 

C:3 


Solutions  to  Review 

Solution  to  Exercise  4.16.1  (p.  202) 

C 

Solution  to  Exercise  4.16.2  (p.  202) 

B 

Solution  to  Exercise  4.16.3  (p.  202) 

A 

Solution  to  Exercise  4.16.4  (p.  203) 

0.5773 

Solution  to  Exercise  4.16.5  (p.  203) 

0.0522 

Solution  to  Exercise  4.16.6  (p.  203) 

B 

Solution  to  Exercise  4.16.7  (p.  203) 

C 

Solution  to  Exercise  4.16.8  (p.  203) 

C 

Solution  to  Exercise  4.16.9  (p.  204) 

0.2709 

Solution  to  Exercise  4.16.10  (p.  204) 
B 

Solution  to  Exercise  4.16.11  (p.  204) 

X  =  the  number  of  patients  calling  in  claiming  to  have  the  flu,  who  actually  have  the  flu.  X  =  0, 1, 2,  ...25 
Solution  to  Exercise  4.16.12  (p.  204) 

5(25,0.04) 

Solution  to  Exercise  4.16.13  (p.  204) 
0.0165 

Solution  to  Exercise  4.16.14  (p.  204) 

1 

Solution  to  Exercise  4.16.15  (p.  204) 

C 

Solution  to  Exercise  4.16.16  (p.  204) 

All  words  used  by  Tom  Clancy  in  his  novels 
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Chapter  5 

Continuous  Random  Variables 


5.1  Continuous  Random  Variables^ 

5.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Recognize  and  understand  continuous  probability  density  functions  in  general. 

•  Recognize  the  uniform  probability  distribution  and  apply  it  appropriately 

•  Recognize  the  exponential  probability  distribution  and  apply  it  appropriately. 


5.1.2  Introduction 

Continuous  random  variables  have  many  applications.  Baseball  batting  averages,  IQ  scores,  the  length 
of  time  a  long  distance  telephone  call  lasts,  the  amount  of  money  a  person  carries,  the  length  of  time  a 
computer  chip  lasts,  and  SAT  scores  are  just  a  few.  The  field  of  reliability  depends  on  a  variety  of  continuous 
random  variables. 

This  chapter  gives  an  introduction  to  continuous  random  variables  and  the  many  continuous  distributions. 
We  will  be  studying  these  continuous  distributions  for  several  chapters. 

NOTE:  The  values  of  discrete  and  continuous  random  variables  can  be  ambiguous.  For  example, 
if  X  is  equal  to  the  number  of  miles  (to  the  nearest  mile)  you  drive  to  work,  then  X  is  a  discrete 
random  variable.  You  count  the  miles.  If  X  is  the  distance  you  drive  to  work,  then  you  measure 
values  of  X  and  X  is  a  continuous  random  variable.  How  the  random  variable  is  defined  is  very 
important. 

5.1.3  Properties  of  Continuous  Probability  Distributions 

The  graph  of  a  continuous  probability  distribution  is  a  curve.  Probability  is  represented  by  area  under  the 
curve. 

The  curve  is  called  the  probability  density  function  (abbreviated:  pdf).  We  use  the  symbol  /  (x)  to  rep- 
resent the  curve.  /  (x)  is  the  function  that  corresponds  to  the  graph;  we  use  the  density  function  /  (x)  to 
draw  the  graph  of  the  probability  distribution. 

^This  content  is  available  online  at  <http://cnx.org/content/ml6808/1.12/>. 
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Area  under  the  curve  is  given  by  a  different  function  called  the  cumulative  distribution  function  (abbre- 
viated: cdf).  The  cumulative  distribution  function  is  used  to  evaluate  probability  as  area. 

•  The  outcomes  are  measured,  not  counted. 

•  The  entire  area  under  the  curve  and  above  the  x-axis  is  equal  to  1 . 

•  Probability  is  found  for  intervals  of  x  values  rather  than  for  individual  x  values. 

•  P(c<x<d)is  the  probability  that  the  random  variable  X  is  in  the  interval  between  the  values  c  and 
d.  P  (c  <  X  <  d)  is  the  area  under  the  curve,  above  the  x-axis,  to  the  right  of  c  and  the  left  of  d. 

•  P  {x  =  c)  —  0  The  probability  that  x  takes  on  any  single  individual  value  is  0.  The  area  below  the 
curve,  above  the  x-axis,  and  between  x=c  and  x=c  has  no  width,  and  therefore  no  area  (area  =  0). 
Since  the  probability  is  equal  to  the  area,  the  probability  is  also  0. 

We  will  find  the  area  that  represents  probability  by  using  geometry,  formulas,  technology,  or  probability 
tables.  In  general,  calculus  is  needed  to  find  the  area  under  the  curve  for  many  probability  density  functions. 
When  we  use  formulas  to  find  the  area  in  this  textbook,  the  formulas  were  found  by  using  the  techniques 
of  integral  calculus.  However,  because  most  students  taking  this  course  have  not  studied  calculus,  we  will 
not  be  using  calculus  in  this  textbook. 

There  are  many  continuous  probability  distributions.  When  using  a  continuous  probability  distribution  to 
model  probability,  the  distribution  used  is  selected  to  best  model  and  fit  the  particular  situation. 

In  this  chapter  and  the  next  chapter,  we  will  study  the  uniform  distribution,  the  exponential  distribution, 
and  the  normal  distribution.  The  following  graphs  illustrate  these  distributions. 

Shaded  Area 
represents 

P(3<i<6) 


0123456789  10 
The  Uniform  Distribution 

Figure  5.1:  The  graph  shows  a  Uniform  Distribution  with  the  area  between  x=3  and  x=6  shaded  to  repre- 
sent the  probability  that  the  value  of  the  random  variable  X  is  in  the  interval  between  3  and  6. 
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Sliaded  Area 

represents 

^  P{2<x<4) 

— \ — 1 — 1    T  1  —\— 

— 1 — 
1  ; 

Id     4     5    6    7  s 

The  Exponential  Distribution 

Figure  5.2:  The  graph  shows  an  Exponential  Distribution  with  the  area  between  x=2  and  x=4  shaded  to 
represent  the  probability  that  the  value  of  the  random  variable  X  is  in  the  interval  between  2  and  4. 


Shaded  area 
represents 
probability 

P(1<s<2) 


-2  -1  0 

The  Normal  Distribution 


Figure  5.3:  The  graph  shows  the  Standard  Normal  Distribution  with  the  area  between  x=l  and  x=2  shaded 
to  represent  the  probability  that  the  value  of  the  random  variable  X  is  in  the  interval  between  1  and  2. 


'With  contributions  from  Roberta  Bloom 


5.2  Continuous  Probability  Functions^ 

We  begin  by  defining  a  continuous  probability  density  function.  We  use  the  function  notation  f  (x).  Inter- 
mediate algebra  may  have  been  your  first  formal  introduction  to  functions.  In  the  study  of  probability,  the 
functions  we  study  are  special.  We  define  the  function  /  (x)  so  that  the  area  between  it  and  the  x-axis  is 
equal  to  a  probability.  Since  the  maximum  probability  is  one,  the  maximum  area  is  also  one. 

For  continuous  probability  distributions,  PROBABILITY  =  AREA. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6805/l.9/>. 


Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


224 


CHAPTER  5.  CONTINUOUS  RANDOM  VARIABLES 


Example  5.1 

Consider  the  function  /  (x)  =  ^  for  0  <  x  <  20.  x  =  a  real  number.  The  graph  of  /  (x)  =  is  a 
horizontal  line.  However,  since  0  <  x  <  20  ,  f  (x)  is  restricted  to  the  portion  between  x  =  0  and 
X  =  20,  inclusive  . 


20 


0 


f(x)  =  — 
20 


/ 


20 


/(x)  =  2^  forO  <  X  <  20. 

The  graph  of  /  (x)  =     is  a  horizontal  line  segment  when  0  <  x  <  20. 

The  area  between  /  (x)  =  ^  where  0  <  ^  <  20  and  the  x-axis  is  the  area  of  a  rectangle  with  base 
=  20  and  height  =  ^ . 

AREA  =  20  ■  ^  =  1 

This  particular  function,  where  we  have  restricted  x  so  that  the  area  between  the  function  and 
the  X-axis  is  1,  is  an  example  of  a  continuous  probability  density  function.  It  is  used  as  a  tool  to 
calculate  probabilities. 

Suppose  we  want  to  find  the  area  between  /  (x)  =     and  the  x-axis  where  0  <  x  <  2  . 

m  j_  I   

20  j 


0  2 

AREA  =  (2  -  0)  ■     =  0.1 

(2  —  0)  =  2  =  base  of  a  rectangle 


20 

X 


20 


the  height. 


The  area  corresponds  to  a  probability.  The  probability  that  x  is  between  0  and  2  is  0.1,  which  can 
be  written  mathematically  as  P(0<x<2)  =  P(x<2)  =  0.1. 


Suppose  we  want  to  find  the  area  between  f  (x)  =  ^  and  the  x-axis  where  4  <  x  <  15 
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m  1. 

20 


0 


15 


20 


AREA  =(15 -4) =0.55 

(15  —  4)  =  11  =  the  base  of  a  rectangle 


2j  =  the  height. 


The  area  corresponds  to  the  probability  P  (4  <  x  <  15)  =0.55. 

Suppose  we  want  to  find  P  {x  =  15).  On  an  x-y  graph,  x  =  15  is  a  vertical  line.  A  vertical  line  has 
no  width  (or  0  width).  Therefore,  P  {x  =  15)  =  (base)(height)  =  (0)  (^^^  =  0. 


f(x) 


J_ 

20 


0 


15  20 


P  (X  <  x)  (can  be  written  as  P  (X  <  x)  for  continuous  distributions)  is  called  the  cumulative  dis- 
tribution function  or  CDF.  Notice  the  "less  than  or  equal  to"  symbol.  We  can  use  the  CDF  to 
calculate  P  (X  >  x)  .  The  CDF  gives  "area  to  the  left"  and  P  (X  >  x)  gives  "area  to  the  right."  We 
calculate  P  (X  >  x)  for  continuous  distributions  as  follows:  P  (X  >  x)  =  1  —  P  (X  <  x). 


f(x) 


X 


P(X  <  x)  P(X  >  x)  =  1  -  P{X  <  x) 

Label  the  graph  with  /  (x)  and  x.  Scale  the  x  and  y  axes  with  the  maximum  x  and  y  values. 
/  (x)  =  ^,  0  <  X  <  20. 

f(x) 


0   2.3  12.7  X 

P  (2.3  <  X  <  12.7)  =  (base)  (height)  =  (12.7  -  2.3)  (^)  =  0.52 
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5.3  The  Uniform  Distribution^ 

Example  5.2 

The  previous  problem  is  an  example  of  the  uniform  probability  distribution. 

Illustrate  the  uniform  distribution.  The  data  that  follows  are  55  smiling  times,  in  seconds,  of  an 
eight-week  old  baby. 


10.4 

19.6 

18.8 

13.9 

17.8 

16.8 

21.6 

17.9 

12.5 

11.1 

4.9 

12.8 

14.8 

22.8 

20.0 

15.9 

16.3 

13.4 

17.1 

14.5 

19.0 

22.8 

1.3 

0.7 

8.9 

11.9 

10.9 

7.3 

5.9 

3.7 

17.9 

19.2 

9.8 

5.8 

6.9 

2.6 

5.8 

21.7 

11.8 

3.4 

2.1 

4.5 

6.3 

10.7 

8.9 

9.4 

9.4 

7.6 

10.0 

3.3 

6.7 

7.8 

11.6 

13.8 

18.6 

Table  5.1 


sample  mean  =  11.49  and  sample  standard  deviation  =  6.23 

We  will  assume  that  the  smiling  times,  in  seconds,  follow  a  uniform  distribution  between  0  and  23 
seconds,  inclusive.  This  means  that  any  smiling  time  from  0  to  and  including  23  seconds  is  equally 
likely.  The  histogram  that  could  be  constructed  from  the  sample  is  an  empirical  distribution  that 
closely  matches  the  theoretical  uniform  distribution. 

Let  X  =  length,  in  seconds,  of  an  eight-week  old  baby's  smile. 

The  notation  for  the  uniform  distribution  is 

X  ~  U  {a,b)  where  a  =  the  lowest  value  of  x  and  b  =  the  highest  value  of  x. 
The  probability  density  function  is  /  (x)  =       lor  a  <  x  <  b. 

For  this  example,  x  ~  fJ  (0,23)  and  /  (x)  =  2330  for  0  <  x  <  23. 
Formulas  for  the  theoretical  mean  and  standard  deviation  are 

^  =  ^  and  (T  = 

For  this  problem,  the  theoretical  mean  and  standard  deviation  are 
H  =  0^  =  11.50  seconds  and  cr  =  ^  ^2^^  =  6.64  seconds 

Notice  that  the  theoretical  mean  and  standard  deviation  are  close  to  the  sample  mean  and  standard 

deviation. 

Example  5.3 
Problem  1 

What  is  the  probability  that  a  randomly  chosen  eight-week  old  baby  smiles  between  2  and  18 

seconds? 

Solution 

Find  P  (2  <  X  <  18). 
^This  content  is  available  online  at  <http://cnx.org/content/ml6819/1.17/>. 
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P  (2  <  X  <  18)  =  (base)  (height)  =  (18  -  2)  ■  ^  =  if . 


0  2 


18  23 


Problem  2 

Find  the  90th  percentile  for  an  eight  week  old  baby's  smiling  time. 
Solution 

Ninety  percent  of  the  smiling  tunes  fall  below  the  90th  percentile,  k,  so  P  {x  <  k)  =  0.90 
P{x<k)  =0.90 
(base)  (height)  =  0.90 
(fc  -  0)  ■  ^  =  0.90 
=  23  ■  0.90  =  20.7 

f(x)  AREA  =  P(X  <  k)  =  0.90 


0  k     23  X 


Problem  3 

Find  the  probability  that  a  random  eight  week  old  baby  smiles  more  than  12  seconds  KNOWING 
that  the  baby  smiles  MORE  THAN  8  SECONDS. 

Solution 

Find  P  (x  >  12\x  >  8)  There  are  two  ways  to  do  the  problem.  For  the  first  way,  use  the  fact  that 
this  is  a  conditional  and  changes  the  sample  space.  The  graph  illustrates  the  new  sample  space. 
You  already  know  the  baby  smiled  more  than  8  seconds. 

Write  a  new  /  (x):  /  (x)  =  jj^g  =  15 
for  8  <  X  <  23 

P  (x  >  12|x  >  8)  =  (23  -  12)  ■  5^  = 
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mi 

15 


0 


:ES^S:=:^^s5^:':^^s3^s^ 


8  12 


23  X 


For  the  second  way,  use  the  conditional  formula  from  Probability  Topics  with  the  original  distri- 
bution X  ~  (J  (0,23): 

P  {A\B)  =  ^^^p^^     For  this  problem,  A  is  (x  >  12)  and  B  is  (x  >  8). 


So,P(x  >  12|x  >  8)  =  ^'"'l^^tV^  =  =  f  =  0-733 


P(x>8) 


n 

23 
15 
23 


23 


Example  5.4 

Uniform:  The  amount  of  time,  in  minutes,  that  a  person  must  wait  for  a  bus  is  imiformly  dis- 
tributed between  0  and  15  minutes,  inclusive. 
Problem  1 

What  is  the  probability  that  a  person  waits  fewer  than  12.5  minutes? 
Solution 

Let  X  =  the  number  of  minutes  a  person  must  wait  for  a  bus.  a  =  0  and  b  =  15.  x  ^  U  (0, 15).  Write 
the  probability  density  function.  /  (x)  =  y^z:q  =      for  0  <  x  <  15. 

Find  P  (x  <  12.5).  Draw  a  graph. 

P  (x  <  A:)  =  (base)  (height)  =  (12.5  -  0)  •  ^  =  0.8333 

The  probability  a  person  waits  less  than  12.5  minutes  is  0.8333. 


f(x) 

1 

15  \ 

0  12.5  15 

X 


Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


229 


Problem  2 

On  the  average,  how  long  must  a  person  wait? 

Find  the  mean,  ]i,  and  the  standard  deviation,  u. 
Solution 

ji  =        =  i^tO  =  7  5  Q]-i  tJig  average,  a  person  must  wait  7.5  minutes. 


c  =  \/  ^^12    =  v  ^"^^2*^''   ~       "'"^^  Standard  deviation  is  4.3  minutes. 


Problem  3 

Ninety  percent  of  the  time,  the  time  a  person  must  wait  falls  below  what  value? 
Note:  This  asks  for  the  90th  percentile. 

Solution 

Find  the  90th  percentile.  Draw  a  graph.  Let  k  -  the  90th  percentile. 

P{x<k)  =  (base)  (height)  =  (fc  -  0)  ■ 
0.90  =  k-^ 

k  =  (0.90)  (15)  =  13.5 

k  is  sometimes  called  a  critical  value. 

The  90th  percentile  is  13.5  minutes.  Ninety  percent  of  the  time,  a  person  must  wait  at  most  13.5 
minutes. 

f(x)  AEEA  =  P(  X  <  k)  =  0.90 

j_  >^ 

15 


0 


Example  5.5 

Uniform:  Suppose  the  time  it  takes  a  nine-year  old  to  eat  a  donut  is  between  0.5  and  4  minutes, 
inclusive.  Let  X  =  the  time,  in  minutes,  it  takes  a  nine-year  old  child  to  eat  a  donut.  Then  X  ^ 
U  (0.5,4). 

Problem  1  (Solution  on  p.  257.) 

The  probability  that  a  randomly  selected  nine-year  old  child  eats  a  donut  in  at  least  two  minutes 
is  . 

Problem  2  (Solution  on  p.  257.) 

Find  the  probability  that  a  different  nine-year  old  child  eats  a  donut  in  more  than  2  minutes  given 
that  the  child  has  already  been  eating  the  donut  for  more  than  1.5  minutes. 
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The  second  probability  question  has  a  conditional  (refer  to  "Probability  Topics  (Section  3.1)").  You 
are  asked  to  find  the  probability  that  a  nine-year  old  child  eats  a  donut  in  more  than  2  minutes 
given  that  the  child  has  already  been  eating  the  donut  for  more  than  1.5  minutes.  Solve  the  prob- 
lem two  different  ways  (see  the  first  example  (Example  5.2)).  You  must  reduce  the  sample  space. 
First  way:  Since  you  already  know  the  child  has  already  been  eating  the  donut  for  more  than  1.5 
minutes,  you  are  no  longer  starting  at  a  =  0.5  minutes.  Your  starting  point  is  1.5  minutes. 

Write  a  new  f(x): 

/  (^)  =  m  =  i     foJ"  1-5  <  X  <  4. 
Find  P  {x  >  2\x  >  1.5).  Draw  a  graph. 


P  (x  >  2\x  >  1.5)  =  (base)  (new  height)  =  (4  -  2)  (2/5)  =? 

The  probability  that  a  nine-year  old  child  eats  a  donut  in  more  than  2  minutes  given  that  the  child 
has  already  been  eating  the  donut  for  more  than  1.5  minutes  is  |. 

Second  way:  Draw  the  original  graph  for  x  ~  U  (0.5,4).  Use  the  conditional  formula 

P(y  ->7\y  ->  ^  ^)  -  P(^>2ANDx>1.5)  _  P(x>2)  _  ^  _  r,  o  _  4 
1  [X  ;>  Z|X  :>  v.D)  ~         p(;c>i.5)         -  p(x>1.5)  ~  2|  ^       ^  5 

NOTE:  See  "Summary  of  the  Uniform  and  Exponential  Probability  Distributions  (Section  5.5)"  for 
a  full  summary. 

Example  5.6 

Uniform:  Ace  Heating  and  Air  Conditioning  Service  finds  that  the  amount  of  time  a  repairman 
needs  to  fix  a  furnace  is  uniformly  distributed  between  1.5  and  4  hours.  Let  x  =  the  time  needed 
to  fix  a  furnace.  Then  x  --^  LT  (1.5,4). 

1.  Find  the  problem  that  a  randomly  selected  furnace  repair  requires  more  than  2  hours. 

2.  Find  the  probability  that  a  randomly  selected  furnace  repair  requires  less  than  3  hours. 

3.  Find  the  30th  percentile  of  furnace  repair  times. 

4.  The  longest  25%  of  repair  furnace  repairs  take  at  least  how  long?  (In  other  words:  Find  the 
minimum  time  for  the  longest  25%  of  repair  times.)  What  percentile  does  this  represent? 

5.  Find  the  mean  and  standard  deviation 

Problem  1 

Find  the  probability  that  a  randomly  selected  furnace  repair  requires  longer  than  2  hours. 
Solution 

To  find  /  (x):  /  (x)  =  ^  =  ^  so  /  (x)  =  0.4 
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P(x>2)  =  (base)(height)  =  (4  -  2)(0.4)  =  0.8 

Example  4  Figure  1 


m 

0.4 


P^s>2) 


Q        1.5  2      3      4  X 


Figure  5.4:  Uniform  Distribution  between  1.5  and  4  with  shaded  area  between  2  and  4  representing  the 
probability  that  the  repair  time  x  is  greater  than  2 


Problem  2 

Find  the  probability  that  a  randomly  selected  furnace  repair  requires  less  than  3  hours.  Describe 
how  the  graph  differs  from  the  graph  in  the  first  part  of  this  example. 

Solution 

P  (x  <  3)  =  (base)(height)  =  (3  -  1.5)(0.4)  =  0.6 

The  graph  of  the  rectangle  showing  the  entire  distribution  would  remain  the  same.  However  the 
graph  should  be  shaded  between  x=1.5  and  x=3.  Note  that  the  shaded  area  starts  at  x=1.5  rather 
than  at  x=0;  since  X'~U(1.5,4),  x  can  not  be  less  than  1.5. 

Example  4  Figure  2 


m 

0.4 


1 


1  1.5  2 


Figure  5.5:  Uniform  Distribution  between  1.5  and  4  with  shaded  area  between  1.5  and  3  representing  the 
probability  that  the  repair  time  x  is  less  than  3 


Problem  3 

Find  the  30th  percentile  of  furnace  repair  times. 
Solution 
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Example  4  Figure  3 


0.4 


Area  =  P(X<k)  =  0.3 
4. 


1.5  k 


4  X 


Figure  5.6:  Ur\iform  Distribution  between  1.5  and  4  with  an  area  of  0.30  shaded  to  the  left,  representing  the 
shortest  30%  of  repair  times. 


P{x<k)  =0.30 

P{x<k)  =  (base)  (height)  =  {k-  1.5)  ■  (0.4) 

0.3  =  (k  -  1.5)  (0.4) ;  Solve  to  find  k: 

0.75  =  k  —  1.5  ,  obtained  by  dividing  both  sides  by  0.4 

k  =  2.25  ,  obtained  by  adding  1.5  to  both  sides 

The  30th  percentile  of  repair  times  is  2.25  hours.  30%  of  repair  times  are  2.5  hours  or  less. 


Problem  4 

The  longest  25%  of  furnace  repair  times  take  at  least  how  long?  (Find  the  minimum  time  for 
the  longest  25%  of  repairs.) 

Solution 


Example  4  Figure  4 


Area=PpC>k)  =  0.25 


0.4 


0        1.5  k   4  s 


Figure  5.7:  Uniform  Distribution  between  1.5  and  4  with  an  area  of  0.25  shaded  to  the  right  representing 
the  longest  25%  of  repair  times. 


P{x>k)  =  0.25 

P{x>k)  =  (base)  (height)  =  {A-k)-  (0.4) 
0.25  =  (4  -  k)(0.4) ;  Solve  for  k: 

Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


233 


0.625  =  4  —  k ,  obtained  by  dividing  both  sides  by  0.4 
—3.375  =  — k ,  obtained  by  subtracting  4  from  both  sides 
k=3.375 

The  longest  25%  of  furnace  repairs  take  at  least  3.375  hoiirs  (3.375  hoiirs  or  longer). 

Note:  Since  25%  of  repair  times  are  3.375  hours  or  longer,  that  means  that  75%  of  repair  times  are 
3.375  hours  or  less.  3.375  hours  is  the  75th  percentile  of  furnace  repair  times. 

Problem  5 

Find  the  mean  and  standard  deviation 
Solution 


NOTE:  See  "Summary  of  the  Uniform  and  Exponential  Probability  Distributions  (Section  5.5)"  for 
a  fuU  siraimary. 


Example  5  contributed  by  Roberta  Bloom 


The  exponential  distribution  is  often  concerned  with  the  amount  of  time  until  some  specific  event  occurs. 
For  example,  the  amount  of  time  (beginning  now)  until  an  earthquake  occurs  has  an  exponential  distri- 
bution. Other  examples  include  the  length,  in  minutes,  of  long  distance  business  telephone  calls,  and  the 
amount  of  time,  in  months,  a  car  battery  lasts.  It  can  be  shown,  too,  that  the  value  of  the  change  that  you 
have  in  your  pocket  or  purse  approximately  follows  an  exponential  distribution. 

Values  for  an  exponential  random  variable  occur  in  the  following  way.  There  are  fewer  large  values  and 
more  small  values.  For  example,  the  amount  of  money  customers  spend  in  one  trip  to  the  supermarket 
follows  an  exponential  distribution.  There  are  more  people  that  spend  less  money  and  fewer  people  that 

spend  large  amounts  of  money. 

The  exponential  distribution  is  widely  used  in  the  field  of  reliability.  Reliability  deals  with  the  amount  of 
time  a  product  lasts. 

Example  5.7 

Illustrates  the  exponential  distribution:  Let  X  =  amount  of  time  (in  minutes)  a  postal  clerk 
spends  with  his/her  customer.  The  time  is  known  to  have  an  exponential  distribution  with  the 
average  amoimt  of  time  equal  to  4  minutes. 

X  is  a  continuous  random  variable  since  time  is  measured.  It  is  given  that  p  =  4  minutes.  To  do 
any  calciilations,  you  must  know  m,  the  decay  parameter. 

m—}-,.  Therefore,  m—\—  0.25 

''This  content  is  available  online  at  <http: / /cnx.org/content/ml6816/1.15/>. 


5.4  The  Exponential  Distribution^ 
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The  standard  deviation,  a,  is  the  same  as  the  mean,  ji  =  a 

The  distribution  notation  is  X~Exp  (m).  Therefore,  X~Exp  (0.25). 

The  probability  density  function  is  /  (x)  =  m-  e^"''^  The  number  e  =  2.71828182846...  It  is  a  number 
that  is  used  often  in  mathematics.  Scientific  calculators  have  the  key  "e^."  If  you  enter  1  for  x,  the 
calculator  will  display  the  value  e. 

The  curve  is: 

/  (x)  =  0.25  ■  e-        where  x  is  at  least  0  and  m  =  0.25. 
For  example,  /  (5)  =  0.25  ■     o  ^^  s  =  0.072 
The  graph  is  as  follows: 


0.25 

^  m  =  0.25 

0.2 

0.15 

0.1 

0.05 

0 

— 1 — \ — \ — 1    PT~~r— 1  1 — 1 

0  2  4  6  8  10  12  14  16  1820 

X 

11  =  4 


Notice  the  graph  is  a  declining  curve.  When  x  =  0, 

/  (x)  =  0.25  ■  e-  0-25  0  =  0.25  •  1  =  0.25  =  m 

Example  5.8 
Problem  1 

Find  the  probability  that  a  clerk  spends  four  to  five  minutes  with  a  randomly  selected  customer. 
Solution 

Find  P  (4  <  x  <  5). 

The  cumulative  distribution  function  (CDF)  gives  the  area  to  the  left. 

P(x  <  x)  =  1  -e-"'-^ 

P  (x  <  5)  =  1  -  e-0-25-5  =  0.7135  and  P  (x  <  4)  =  1  -  e-O-25-4  =  o.6321 
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fi[x)  P(4  <  X  <  5) 


0.25 


0         4  5 


NOTE:  You  can  do  these  calculations  easily  on  a  calculator. 

The  probability  that  a  postal  clerk  spends  four  to  five  minutes  with  a  randomly  selected  customer 
is 

P  (4  <  X  <  5)  =  P  (x  <  5)  -  P  (x  <  4)  =  0.7135  -  0.6321  =  0.0814 

NOTE:  TI-83+  and  TI-84:  On  the  home  screen,  enter  (l-e'^(-.25*5))-(l-e^(-.25*4))  or  enter  e'^(-.25*4)- 
e'^(-.25*5). 


Problem  2 

Half  of  all  customers  are  finished  within  how  long?  (Find  the  50th  percentile) 
Solution 

Find  the  50th  percentile. 

f(x)  P(  X  <  k)  =  0.50 


0,25 


P  {x  <  k)  =  0.50,  k  =  2.8  minutes  (calculator  or  computer) 

Half  of  all  customers  are  finished  within  2.8  minutes. 

You  can  also  do  the  calculation  as  follows: 

P{x  <k)  =  0.50  and  P  (x  <  A:)  =  1  - 

Therefore,  0.50  =  1  -  e-0-25  fc  ^  ^  _  q  sq  =  0.5 

Take  natural  logs:  In  (^e^O-^s/c^  ^     (|o.50).  So,  -0.25  -k^ln  (0.50) 

Solve  for  k:k=  '^q^  =  2.8  minutes 
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NOTE:  A  formula  for  the  percentile  kisk  — 
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LN(l-AreaToTheLeft)  ^^^^^  ^^^^ 


NOTE:  T1-83+  and  Tl-84:  On  the  home  screen,  enter  LN(l-.50)/-.25.  Press  the  (-)  for  the  negative. 


Problem  3 

Which  is  larger,  the  mean  or  the  median? 
Solution 

Is  the  mean  or  median  larger? 

From  part  b,  the  median  or  50fh  percentile  is  2.8  minutes.  The  theoretical  mean  is  4  minutes.  The 
mean  is  larger. 


5.4.1  Optional  Collaborative  Classroom  Activity 

Have  each  class  member  count  the  change  he/she  has  in  his/her  pocket  or  purse.  Your  instructor  will 

record  the  amounts  in  dollars  and  cents.  Construct  a  histogram  of  the  data  taken  by  the  class.  Use  5 
intervals.  Draw  a  smooth  curve  through  the  bars.  The  graph  should  look  approximately  exponential.  Then 
calciilate  the  mean. 

Let  X  =  the  amount  of  money  a  student  in  your  class  has  in  his/her  pocket  or  purse. 

The  distribution  for  X  is  approximately  exponential  with  mean,  ji  =  and  m  -  .  The  standard 

deviation,  c  =  . 

Draw  the  appropriate  exponential  graph.  You  shoiild  label  the  x  and  y  axes,  the  decay  rate,  and  the  mean. 
Shade  the  area  that  represents  the  probability  that  one  student  has  less  than  $.40  in  his/her  pocket  or  purse. 
(Shade  P  {x  <  0.40)). 

Example  5.9 

On  the  average,  a  certain  computer  part  lasts  10  years.  The  length  of  time  the  computer  part  lasts 
is  exponentially  distributed. 

Problem  1 

What  is  the  probability  that  a  computer  part  lasts  more  than  7  years? 
Solution 

Let  X  =  the  amoimt  of  time  (in  years)  a  computer  part  lasts. 
^  =  10  so  m  =  i  =  ^  =  0.1 
Find  P  {x  >  7).  Draw  a  graph. 
P{x>7)  ^1-P{x  <7). 

Since  P  (X  <  ;c)  =  1  -  e"'"^  then  P{X>x)^l-{l-  e"'"-^)  =  e"'"'^ 

P{x  >7)  —  e~^-^'^  —  0.4966.  The  probability  that  a  computer  part  lasts  more  than  7  years  is  0.4966. 
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NOTE:  TI-83+  and  TI-84:  On  the  home  screen,  enter  e'^(-.l*7). 


P(s>7) 


10 


Problem  2 

On  the  average,  how  long  would  5  computer  parts  last  if  they  are  used  one  after  another? 
Solution 

On  the  average,  1  computer  part  lasts  10  years.  Therefore,  5  computer  parts,  if  they  are  used  one 
right  after  the  other  would  last,  on  the  average, 

(5)  (10)  =  50  years. 
Problem  3 

Eighty  percent  of  computer  parts  last  at  most  how  long? 
Solution 

Find  the  80th  percentile.  Draw  a  graph.  Let  k  =  the  80th  percentile. 


P(  X  <  k)  =  O.SO 


Solve  for  k:  k  =  ^"^Ig  f*^"^  ~  ^^'^  years 

Eighty  percent  of  the  computer  parts  last  at  most  16.1  years. 
NOTE:  T1-83+  and  TI-84:  On  the  home  screen,  enter  LN(1  -  .80)/-.l 


Problem  4 

What  is  the  probability  that  a  computer  part  lasts  between  9  and  11  years? 
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Solution 

Find  P  (9  <  X  <  11).  Draw  a  graph. 


f(x)  P(9<x<ll) 


0        9  11 


1^-  10 

P  (9  <  X  <  11)  =  P  (x  <  11)  -  P  (x  <  9)  =  (1  -  e-o-i  ")  -  (1  -  e-°-i-^)  =  0.6671  -  0.5934  = 
0.0737.  (calculator  or  computer) 

The  probability  that  a  computer  part  lasts  between  9  and  11  years  is  0.0737. 
NOTE:  T1-83+  and  Tl-84:  On  the  home  screen,  enter  e^(-.l*9)  -  e^(-.ini). 


Example  5.10 

Suppose  that  the  length  of  a  phone  call,  in  minutes,  is  an  exponential  random  variable  with  decay 
parameter  =  ■  If  another  person  arrives  at  a  public  telephone  just  before  you,  find  the  probability 
that  you  will  have  to  wait  more  than  5  minutes.  Let  X  =  the  length  of  a  phone  call,  in  minutes. 

Problem  (Solution  on  p.  257.) 

What  is  m,  ji,  and  al  The  probability  that  you  must  wait  more  than  5  minutes  is  . 


NOTE:  A  summary  for  exponential  distribution  is  available  in  "Summary  of  The  Uniform  and 
Exponential  Probability  Distributions  (Section  5.5)". 
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5.5  Summary  of  the  Uniform  and  Exponential  Probability  Distributions^ 

Formula  5.1:  Uniform 

X  =  a  real  number  between  a  and  b  (in  some  instances,  X  can  take  on  the  values  a  and  b).  a  = 
smallest  X;b  =  largest  X 

X  ~  1/  {a,b) 

The  mean  is  ^  —  ^ 


Probability  density  function:  /  (X)  —       for  a  <  X  <  b 
Area  to  the  Left  of  x:  P  (X  <  x)  =  (base)(height) 
Area  to  the  Right  of  x:  P  {X  >  x)  ^  (base)(height) 

Area  Between  c  and  d:  P  {c  <  X  <  d)  —  (base)  (height)  —  {d  —  c)  (height). 

Formula  5.2:  Exponential 

X  ~  Exp  (m) 

X  =  a  real  niunber,  0  or  larger,  m  =  the  parameter  that  controls  the  rate  of  decay  or  decline 
The  mean  and  standard  deviation  are  the  same. 


The  standard  deviation  is  cr  = 


=  i  and  m  =  i  =  i 


The  probability  density  function:  /  (X)  —  m  -  e 


X  >  0 


Area  to  the  Left  of  x:  P  (X  <  x)  =  l-  e 


—m-x 


Area  to  the  Right  of  x:  P  (X  >  x)  =  e"™^ 

Area  Between  c  and  d:  P  (c  <  X  <  d)  =  P  (X  <  d)  -  P  (X  <  c)  = 


1-e 


,—  m-d 


—  m-c  _  p—  m-d 


Percentile,  k:  I  =  LN(l-AreaToTheLeft) 


^This  content  is  available  online  at  <http://cnx.org/content/ml6813/1.10/>. 
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5.6  Practice  1:  Uniform  Distribution^ 

5.6.1  Student  Learning  Outcomes 

•  The  student  will  analyze  data  following  a  uniform  distribution. 

5.6.2  Given 

The  age  of  cars  in  the  staff  parking  lot  of  a  suburban  college  is  uniformly  distributed  from  six  months  (0.5 
years)  to  9.5  years. 

5.6.3  Describe  the  Data 

Exercise  5.6.1  (Solution  on  p.  257.) 

What  is  being  measiired  here? 

Exercise  5.6.2  (Solution  on  p.  257.) 

In  words,  define  the  Random  Variable  X. 

Exercise  5.6.3  (Solution  on  p.  257.) 

Are  the  data  discrete  or  continuous? 

Exercise  5.6.4  (Solution  on  p.  257.) 

The  interval  of  values  for  x  is: 

Exercise  5.6.5  (Solution  on  p.  257.) 

The  distribution  for  X  is: 


5.6.4  Probability  Distribution 

Exercise  5.6.6  (Solution  on  p.  257.) 

Write  the  probability  density  function. 

Exercise  5.6.7  (Solution  on  p.  257.) 

Graph  the  probability  distribution. 

a.  Sketch  the  graph  of  the  probability  distribution. 


Figure  5.8 


''This  content  is  available  online  at  <http: / /cnx.org/content/ml6812/1.14/>. 
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b.  Identify  the  following  values: 

i.  Lowest  value  for  x: 

ii.  Highest  value  for  x: 

iii.  Height  of  the  rectangle: 

iv.  Label  for  x-axis  (words): 
V.  Label  for  y-axis  (words): 


5.6.5  Random  Probability 

Exercise  5.6.8  (Solution  on  p.  257.) 

Find  the  probability  that  a  randomly  chosen  car  in  the  lot  was  less  than  4  years  old. 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 


Figure  5.9 


b.  Find  the  probability.  P{x  <4:)  = 

Exercise  5.6.9  (Solution  on  p.  257.) 

Out  of  just  the  cars  less  than  7.5  years  old,  find  the  probability  that  a  randomly  chosen  car  in  the 
lot  was  less  than  4  years  old. 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 
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Figure  5.10 


b.  Find  the  probability.  P  (;c  <  4  |  ;c  <  7.5)  = 
Exercise  5.6.10:  Discussion  Question 

What  has  changed  in  the  previous  two  problems  that  made  the  solutions  different? 


5.6.6  Quartiles 

Exercise  5.6.11  (Solution  on  p.  257.) 

Find  the  average  age  of  the  cars  in  the  lot. 

Exercise  5.6.12  (Solution  on  p.  257.) 

Find  the  third  quartile  of  ages  of  cars  in  the  lot.  This  means  you  will  have  to  find  the  value  such 
that  |,  or  75%,  of  the  cars  are  at  most  (less  than  or  equal  to)  that  age. 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 


Figure  5.11 


b.  Find  the  value  k  such  that  P  (x  <  fc)  =  0.75. 

c.  The  third  quartile  is: 
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5.7  Practice  2:  Exponential  Distribution^ 

5.7.1  Student  Learning  Outcomes 

•  The  student  will  analyze  data  following  the  exponential  distribution. 

5.7.2  Given 

Carbon-14  is  a  radioactive  element  with  a  half-life  of  about  5730  years.  Carbon-14  is  said  to  decay  exponen- 
tially. The  decay  rate  is  0.000121 .  We  start  with  1  gram  of  carbon-14.  We  are  interested  in  the  time  (years) 
it  takes  to  decay  carbon-14. 

5.7.3  Describe  the  Data 
Exercise  5.7.1 

What  is  being  measured  here? 

Exercise  5.7.2  (Solution  on  p.  258.) 

Are  the  data  discrete  or  continuous? 

Exercise  5.7.3  (Solution  on  p.  258.) 

In  words,  define  the  Random  Variable  X. 

Exercise  5.7.4  (Solution  on  p.  258.) 

What  is  the  decay  rate  (m)? 

Exercise  5.7.5  (Solution  on  p.  258.) 

The  distribution  for  X  is: 


5.7.4  Probability 

Exercise  5.7.6  (Solution  on  p.  258.) 

Find  the  amount  (percent  of  1  gram)  of  carbon-14  lasting  less  than  5730  years.  This  means,  find 
P{x<  5730). 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 


Figure  5.12 


^This  content  is  available  online  at  <http://cnx.0rg/content/ml68ll/l.ll/>. 
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b.  Find  the  probability.  P  {x  <  5730)  = 

Exercise  5.7.7  (Solution  on  p.  258.) 

Find  the  percentage  of  carbon-14  lasting  longer  than  10,000  years. 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 


Figure  5.13 


b.  Find  the  probability  P{x>  10000)  = 

Exercise  5.7.8  (Solution  on  p.  258.) 

Thirty  percent  (30%)  of  carbon-14  will  decay  within  how  many  years? 

a.  Sketch  the  graph.  Shade  the  area  of  interest. 


Figure  5.14 


b.  Find  the  value  k  such  that  P{x  <k)  ^  0.30. 
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5.8  Homework^ 

For  each  probability  and  percentile  problem,  DRAW  THE  PICTURE! 
Exercise  5.8.1 

Consider  the  following  experiment.  You  are  one  of  100  people  enlisted  to  take  part  in  a  study  to 
determine  the  percent  of  nurses  in  America  with  an  R.N.  (registered  nurse)  degree.  You  ask  nurses 
if  they  have  an  R.N.  degree.  The  nurses  answer  "yes"  or  "no."  You  then  calciilate  the  percentage 
of  nurses  with  an  R.N.  degree.  You  give  that  percentage  to  your  supervisor. 

a.  What  part  of  the  experiment  will  yield  discrete  data? 

b.  What  part  of  the  experiment  will  yield  continuous  data? 

Exercise  5.8.2 

When  age  is  rounded  to  the  nearest  year,  do  the  data  stay  continuous,  or  do  they  become  discrete? 
Why? 

Exercise  5.8.3  (Solution  on  p.  258.) 

Births  are  approximately  uniformly  distributed  between  the  52  weeks  of  the  year.  They  can  be 
said  to  follow  a  Uniform  Distribution  from  1-53  (spread  of  52  weeks). 

a.  X~ 

b.  Graph  the  probability  distribution. 

c.  /(x)  = 

d.  }i  — 

e.  cr  — 

f.  Find  the  probability  that  a  person  is  bom  at  the  exact  moment  week  19  starts.  That  is,  find 

p  (x  =  19)  = 

g.  P(2  <  X  <  31)  = 

h.  Find  the  probability  that  a  person  is  born  after  week  40. 

i.  P  (12  <  X  I  X  <  28)  = 

j.  Find  the  70th  percentile. 

k.  Find  the  minimum  for  the  upper  quarter. 

Exercise  5.8.4 

A  random  number  generator  picks  a  number  from  1  to  9  in  a  imiform  manner. 

a.  X~ 

b.  Graph  the  probability  distribution. 

c.  /W- 

d.  ji  = 

e.  cr  — 

f.  P  (3.5  <x<  7.25)  = 

g.  P{x>  5.67)  = 

h.  P  (x  >  5  I  X  >  3)  = 

i.  Find  the  90th  percentile. 

Exercise  5.8.5  (Solution  on  p.  258.) 

The  time  (in  minutes)  until  the  next  bus  departs  a  major  bus  depot  follows  a  distribution  with 
/  (x)  =  ^  where  x  goes  from  25  to  45  minutes. 

a.  Define  the  random  variable.  X  — 
*This  content  is  available  onUne  at  <http://caTx.org/content/ml6807/1.14/>. 
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b.  X~ 

c.  Graph  the  probability  distribution. 

d.  The  distribution  is  (name  of  distribution).  It  is  (discrete  or  con- 

tinuous). 

f.  cr  = 

g.  Find  the  probability  that  the  time  is  at  most  30  minutes.  Sketch  and  label  a  graph  of  the  distri- 

bution. Shade  the  area  of  interest.  Write  the  answer  in  a  probability  statement. 

h.  Find  the  probability  that  the  time  is  between  30  and  40  minutes.  Sketch  and  label  a  graph  of 

the  distribution.  Shade  the  area  of  interest.  Write  the  answer  in  a  probability  statement. 

i.  P  (25  <  X  <  55)  —  .  State  this  in  a  probability  statement  (similar  to  g  and  h  ),  draw 

the  picture,  and  find  the  probability. 

j.  Find  the  90th  percentile.  This  means  that  90%  of  the  time,  the  time  is  less  than  minutes. 

k.  Find  the  75th  percentile.  In  a  complete  sentence,  state  what  this  means.  (See  j.) 

1.  Find  the  probability  that  the  time  is  more  than  40  minutes  given  (or  knowing  that)  it  is  at  least 

30  minutes. 

Exercise  5.8.6 

According  to  a  study  by  Dr.  John  McDougall  of  his  live-in  weight  loss  program  at  St.  Helena 
Hospital,  the  people  who  follow  his  program  lose  between  6  and  15  pounds  a  month  until  they 
approach  trim  body  weight.  Let's  suppose  that  the  weight  loss  is  uniformly  distributed.  We  are 
interested  in  the  weight  loss  of  a  randomly  selected  individual  following  the  program  for  one 
month.  (Source:  The  McDougall  Program  for  Maximum  Weight  Loss  by  John  A.  McDougall, 
M.D.) 

a.  Define  the  random  variable.  X  — 

b.  X~ 

c.  Graph  the  probability  distribution. 

e.  }i  = 

f.  cr  = 

g.  Find  the  probability  that  the  individual  lost  more  than  10  pounds  in  a  month. 

h.  Suppose  it  is  known  that  the  individual  lost  more  than  10  pounds  in  a  month.  Find  the  proba- 

bility that  he  lost  less  than  12  pounds  in  the  month. 

i.  P  {7  <  X  <  13  \  X  >  9)  =  .  State  this  in  a  probability  question  (similar  to  g  and  h), 

draw  the  picture,  and  find  the  probability. 

Exercise  5.8.7  (Solution  on  p.  258.) 

A  subway  train  on  the  Red  Line  arrives  every  8  minutes  during  rush  hour.  We  are  interested  in  the 
length  of  time  a  commuter  must  wait  for  a  train  to  arrive.  The  time  follows  a  uniform  distribution. 

a.  Define  the  random  variable.  X  — 

b.  X~ 

c.  Graph  the  probability  distribution. 

d.  /(x)  = 

e.  ji  — 

f.  (7  = 

g.  Find  the  probability  that  the  commuter  waits  less  than  one  minute. 

h.  Find  the  probability  that  the  commuter  waits  between  three  and  four  minutes. 

i.  60%  of  commuters  wait  more  than  how  long  for  the  train?  State  this  in  a  probability  question 

(similar  to  g  and  h),  draw  the  pictiire,  and  find  the  probability. 
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Exercise  5.8.8 

The  age  of  a  first  grader  on  September  1  at  Garden  Elementary  School  is  imiformly  distributed 
from  5.8  to  6.8  years.  We  randomly  select  one  first  grader  from  ttie  class. 

a.  Define  the  random  variable.  X  — 

b.  X~ 

c.  Graph  the  probability  distribution. 

e.  ^  = 

L  cr  = 

g.  Find  the  probability  that  she  is  over  6.5  years. 

h.  Find  the  probability  that  she  is  between  4  and  6  years. 

i.  Find  the  70th  percentile  for  the  age  of  first  graders  on  September  1  at  Garden  Elementary  School. 

Exercise  5.8.9  (Solution  on  p.  259.) 

Let  X~Exp(0.1) 

a.  decay  rate= 

b.  }i  = 

c.  Graph  the  probability  distribution  ftmction. 

d.  On  the  above  graph,  shade  the  area  corresponding  to  P  (x  <  6)  and  find  the  probability. 

e.  Sketch  a  new  graph,  shade  the  area  corresponding  to  P  (3  <  x  <  6)  and  find  the  probability. 

f.  Sketch  a  new  graph,  shade  the  area  corresponding  to  P  (x  >  7)  and  find  the  probability. 

g.  Sketch  a  new  graph,  shade  the  area  corresponding  to  the  40th  percentile  and  find  the  value. 

h.  Find  the  average  value  of  x. 

Exercise  5.8.10 

Suppose  that  the  length  of  long  distance  phone  calls,  measured  in  minutes,  is  known  to  have  an 
exponential  distribution  with  the  average  length  of  a  call  equal  to  8  minutes. 

a.  Define  the  random  variable.  X  — 

b.  Is  X  continuous  or  discrete? 

c.  X~ 

d.  p  — 

e.  a  — 

f .  Draw  a  graph  of  the  probability  distribution.  Label  the  axes. 

g.  Find  the  probability  that  a  phone  call  lasts  less  than  9  minutes. 

h.  Find  the  probability  that  a  phone  call  lasts  more  than  9  minutes. 

i.  Find  the  probability  that  a  phone  call  lasts  between  7  and  9  minutes. 

j.  If  25  phone  calls  are  made  one  after  another,  on  average,  what  would  you  expect  the  total  to  be? 
Why? 

Exercise  5.8.11  (Solution  on  p.  259.) 

Suppose  that  the  useful  life  of  a  particular  car  battery,  measured  in  months,  decays  with  parameter 
0.025.  We  are  interested  in  the  life  of  the  battery. 

a.  Define  the  random  variable.  X  — 
h.  Is  X  continuous  or  discrete? 

c.  X~ 

d.  On  average,  how  long  would  you  expect  1  car  battery  to  last? 

e.  On  average,  how  long  would  you  expect  9  car  batteries  to  last,  if  they  are  used  one  after  another? 

f .  Find  the  probability  that  a  car  battery  lasts  more  than  36  months. 
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g.  70%  of  the  batteries  last  at  least  how  long? 
Exercise  5.8.12 

The  percent  of  persons  (ages  5  and  older)  in  each  state  who  speak  a  language  at  home  other  than 
English  is  approximately  exponentially  distributed  with  a  mean  of  9.848  .  Suppose  we  randomly 
pick  a  state.  (Soiurce:  Biureau  of  the  Census,  U.S.  Dept.  of  Commerce) 

a.  Define  the  random  variable.  X  — 

b.  Is  X  continuous  or  discrete? 

c.  X~ 

d.  }i  — 

e.  a  = 

f.  Draw  a  graph  of  the  probability  distribution.  Label  the  axes. 

g.  Find  the  probability  that  the  percent  is  less  than  12. 

h.  Find  the  probability  that  the  percent  is  between  8  and  14. 

i.  The  percent  of  all  individuals  living  in  the  United  States  who  speak  a  language  at  home  other 

than  English  is  13.8  . 

i.  Why  is  this  number  different  from  9.848%? 

ii.  What  woiild  make  this  number  higher  than  9.848%? 

Exercise  5.8.13  (Solution  on  p.  259.) 

The  time  (in  years)  after  reaching  age  60  that  it  takes  an  individual  to  retire  is  approximately 
exponentially  distributed  with  a  mean  of  about  5  years.  Suppose  we  randomly  pick  one  retired 
individual.  We  are  interested  in  the  time  after  age  60  to  retirement. 

a.  Define  the  random  variable.  X  — 

b.  Is  X  continuous  or  discrete? 

c.  X~ 

d.  }i  — 

e.  cr  — 

f .  Draw  a  graph  of  the  probability  distribution.  Label  the  axes. 

g.  Find  the  probability  that  the  person  retired  after  age  70. 

h.  Do  more  people  retire  before  age  65  or  after  age  65? 

i.  In  a  room  of  1000  people  over  age  80,  how  many  do  you  expect  will  NOT  have  retired  yet? 
Exercise  5.8.14 

The  cost  of  all  maintenance  for  a  car  diuring  its  first  year  is  approximately  exponentially  dis- 
tributed with  a  mean  of  $150. 

a.  Define  the  random  variable.  X  — 

b.  X~ 

c.  pi^ 

d.  cr  = 

e.  Draw  a  graph  of  the  probability  distribution.  Label  the  axes. 

f .  Find  the  probability  that  a  car  required  over  $300  for  maintenance  during  its  first  year. 
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5.8.1  Try  these  multiple  choice  problems 

The  next  three  questions  refer  to  the  following  information.  The  average  lifetime  of  a  certain  new  cell 
phone  is  3  years.  The  manufacturer  will  replace  any  cell  phone  failing  within  2  years  of  the  date  of  purchase. 
The  lifetime  of  these  ceU  phones  is  known  to  foUow  an  exponential  distribution. 

Exercise  5.8.15  (Solution  on  p.  259.) 

The  decay  rate  is 

A.  0.3333 

B.  0.5000 

C.  2.0000 

D.  3.0000 


Exercise  5.8.16  (Solution  on  p.  259.) 

What  is  the  probability  that  a  phone  wiU  fail  within  2  years  of  the  date  of  piirchase? 

A.  0.8647 

B.  0.4866 

C.  0.2212 
d.  0.9997 


Exercise  5.8.17  (Solution  on  p.  259.) 

What  is  the  median  lifetime  of  these  phones  (in  years)? 

A.  0.1941 

B.  1.3863 

C.  2.0794 

D.  5.5452 


The  next  three  questions  refer  to  the  following  information.  The  Sky  Train  from  the  terminal  to  the  rental 
car  and  long  term  parking  center  is  supposed  to  arrive  every  8  minutes.  The  waiting  times  for  the  train  are 
known  to  follow  a  uniform  distribution. 

Exercise  5.8.18  (Solution  on  p.  259.) 

What  is  the  average  waiting  time  (in  minutes)? 

A.  0.0000 

B.  2.0000 

C.  3.0000 

D.  4.0000 


Exercise  5.8.19  (Solution  on  p.  259.) 

Find  the  30th  percentile  for  the  waiting  times  (in  minutes). 

A.  2.0000 

B.  2.4000 

C.  2.750 

D.  3.000 


Exercise  5.8.20  (Solution  on  p.  259.) 

The  probability  of  waiting  more  than  7  minutes  given  a  person  has  waited  more  than  4  minutes 
is? 


A.  0.1250 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


CHAPTER  5.  CONTINUOUS  RANDOM  VARIABLES 


B.  0.2500 

C.  0.5000 

D.  0.7500 
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5.9  Review' 

Exercise  5.9.1  -  Exercise  5.9.7  refer  to  the  following  study:  A  recent  study  of  mothers  of  junior  high  school 
children  in  Santa  Clara  County  reported  that  76%  of  the  mothers  are  employed  in  paid  positions.  Of  those 
mothers  who  are  employed,  64%  work  full-time  (over  35  hours  per  week),  and  36%  work  part-time.  How- 
ever, out  of  all  of  the  mothers  in  the  population,  49%  work  full-time.  The  population  under  study  is  made 
up  of  mothers  of  junior  high  school  children  in  Santa  Clara  County. 

Let  E  =employed.  Let  f  =full-time  employment 

Exercise  5.9.1  (Solution  on  p.  259.) 

a.  Find  the  percent  of  all  mothers  in  the  population  that  NOT  employed. 

b.  Find  the  percent  of  mothers  in  the  population  that  are  employed  part-time. 

Exercise  5.9.2  (Solution  on  p.  259.) 

The  type  of  employment  is  considered  to  be  what  type  of  data? 

Exercise  5.9.3  (Solution  on  p.  259.) 

Find  the  probability  that  a  randomly  selected  mother  works  part-time  given  that  she  is  employed. 

Exercise  5.9.4  (Solution  on  p.  260.) 

Find  the  probability  that  a  randomly  selected  person  from  the  population  will  be  employed  OR 
work  full-time. 

Exercise  5.9.5  (Solution  on  p.  260.) 

Based  upon  the  above  information,  are  being  employed  AND  working  part-time: 

a.  mutually  exclusive  events?  Why  or  why  not? 

b.  independent  events?  Why  or  why  not? 

Exercise  5.9.6  -  Exercise  5.9.7  refer  to  the  following:  We  randomly  pick  10  mothers  from  the  above  popu- 
lation. We  are  interested  in  the  number  of  the  mothers  that  are  employed.  Let  X  ^number  of  mothers  that 
are  employed. 

Exercise  5.9.6  (Solution  on  p.  260.) 

State  the  distribution  for  X. 

Exercise  5.9.7  (Solution  on  p.  260.) 

Find  the  probability  that  at  least  6  are  employed. 

Exercise  5.9.8  (Solution  on  p.  260.) 

We  expect  the  Statistics  Discussion  Board  to  have,  on  average,  14  questions  posted  to  it  per  week. 
We  are  interested  in  the  number  of  questions  posted  to  it  per  day. 

a.  Define  X. 

b.  What  are  the  values  that  the  random  variable  may  take  on? 

c.  State  the  distribution  for  X. 

d.  Find  the  probability  that  from  10  to  14  (inclusive)  questions  are  posted  to  the  Listserv  on  a 

randomly  picked  day. 

Exercise  5.9.9  (Solution  on  p.  260.) 

A  person  invests  $1000  in  stock  of  a  company  that  hopes  to  go  public  in  1  year. 

•  The  probability  that  the  person  will  lose  all  his  money  after  1  year  (i.e.  his  stock  will  be 
worthless)  is  35%. 

'This  content  is  available  online  at  <http://caTx.org/content/ml6810/l.ll/>. 
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•  The  probability  that  the  person's  stock  will  still  have  a  value  of  $1000  after  1  year  (i.e.  no 
profit  and  no  loss)  is  60%. 

•  The  probability  that  the  person's  stock  will  increase  in  value  by  $10,000  after  1  year  (i.e.  will 
be  worth  $11,000)  is  5%. 

Find  the  expected  PROFIT  after  1  year. 

Exercise  5.9.10  (Solution  on  p.  260.) 

Rachel's  piano  cost  $3000.  The  average  cost  for  a  piano  is  $4000  with  a  standard  deviation  of 
$2500.  Becca's  guitar  cost  $550.  The  average  cost  for  a  guitar  is  $500  with  a  standard  deviation 
of  $200.  Matt's  drums  cost  $600.  The  average  cost  for  drums  is  $700  with  a  standard  deviation  of 
$100.  Whose  cost  was  lowest  when  compared  to  his  or  her  own  instrument?  Justify  your  answer. 

Exercise  5.9.11  (Solution  on  p.  260.) 


0  2  4  ;  7 

For  each  statement  below,  explain  why  each  is  either  true  or  false. 

a.  25%  of  the  data  are  at  most  5. 

b.  There  is  the  same  amount  of  data  from  4  -  5  as  there  is  from  5-7. 

c.  There  are  no  data  values  of  3. 

d.  50%  of  the  data  are  4. 


Exercise  5.9.12  -  Exercise  5.9.13  refer  to  the  following:  64  faculty  members  were  asked  the  number  of 
cars  they  owned  (including  spouse  and  children's  cars).  The  results  are  given  in  the  following  graph: 

fiequencv 
0.45 


0J5 
025 
0.15 


0 


1 


number  of  cart 


(Solution  on  p.  260.) 


Exercise  5.9.12 

Find  the  approximate  number  of  responses  that  were  "3." 

Exercise  5.9.13  (Solution  on  p.  260.) 

Find  the  first,  second  and  third  quartiles.  Use  them  to  construct  a  box  plot  of  the  data. 

Exercise  5.9.14  -  Exercise  5.9.15  refer  to  the  following  study  done  of  the  Girls  soccer  team  "Snow  Leop- 
ards": 


Hair  Style 

Hair  Color 

blond 

brown 

black 

ponytail 

3 

2 

5 

plain 

2 

2 

1 
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Table  5.2 

Suppose  that  one  girl  from  the  Snow  Leopards  is  randomly  selected. 

Exercise  5.9.14  (Solution  on  p.  260.) 

Find  the  probability  that  the  girl  has  black  hair  GIVEN  that  she  wears  a  ponytail. 

Exercise  5.9.15  (Solution  on  p.  260.) 

Find  the  probability  that  the  girl  wears  her  hair  plain  OR  has  brown  hair. 

Exercise  5.9.16  (Solution  on  p.  260.) 

Find  the  probability  that  the  girl  has  blond  hair  AND  that  she  wears  her  hair  plain. 
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5.10  Lab:  Continuous  Distribution^" 

Class  Time: 
Names: 

5.10.1  Student  Learning  Outcomes: 

•  The  student  will  compare  and  contrast  empirical  data  from  a  random  number  generator  with  the 
Uniform  Distribution. 


5.10.2  Collect  the  Data 

Use  a  random  number  generator  to  generate  50  values  between  0  and  1  (inclusive).  List  them  below.  Round 
the  numbers  to  4  decimal  places  or  set  the  calculator  MODE  to  4  places. 

1.  Complete  the  table: 


Table  5.3 

2.  Calculate  the  following: 

A.  X  — 

b.  s  = 

c.  1st  quartile  = 

d.  3rd  quartile  = 

e.  Median  = 

5.10.3  Organize  the  Data 

1.  Construct  a  histogram  of  the  empirical  data.  Make  8  bars. 
^"This  content  is  available  online  at  <http://catx.org/content/ml6803/1.13/>. 
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Relative  Frequency 


X 


Figure  5.15 


2.  Construct  a  histogram  of  the  empirical  data.  Make  5  bars. 


Relative  Frequency 


X 


Figure  5.16 
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5.10.4  Describe  the  Data 

1.  Describe  the  shape  of  each  graph.  Use  2-3  complete  sentences.  (Keep  it  simple.  Does  the  graph  go 
straight  across,  does  it  have  a  V  shape,  does  it  have  a  hump  in  the  middle  or  at  either  end,  etc.?  One 
way  to  help  you  determine  a  shape,  is  to  roughly  draw  a  smooth  curve  through  the  top  of  the  bars.) 

2.  Describe  how  changing  the  niimber  of  bars  might  change  the  shape. 


5.10.5  Theoretical  Distribution 

1.  In  words,  X  = 

2.  The  theoretical  distribution  of  X  is  X  ~  U  (0, 1).  Use  it  for  this  part. 

3.  In  theory,  based  upon  the  distribution  X  ~  U  (0, 1),  complete  the  following. 

a.  }i  — 

h.  (T  = 

c.  1st  quartile  = 

d.  3rd  quartile  = 

e.  median  =  

4.  Are  the  empirical  values  (the  data)  in  the  section  titled  "Collect  the  Data"  close  to  the  corresponding 
theoretical  values  above?  Why  or  why  not? 


5.10.6  Plot  the  Data 

1.  Construct  a  box  plot  of  the  data.  Be  sure  to  use  a  ruler  to  scale  accurately  and  draw  straight  edges. 

2.  Do  you  notice  any  potential  outliers?  If  so,  which  values  are  they?  Either  way,  numerically  justify  your 
answer.  (Recall  that  any  DATA  are  less  than  Ql  -  1.5*IQR  or  more  than  Q3  +  1.5*IQR  are  potential 
outliers.  IQR  means  interquartile  range.) 


5.10.7  Compare  the  Data 

1.  For  each  part  below,  use  a  complete  sentence  to  comment  on  how  the  value  obtained  from  the  data 
compares  to  the  theoretical  value  you  expected  from  the  distribution  in  the  section  titled  "Theoretical 
Distribution." 

a.  minimum  value: 

b.  1st  quartile: 

c.  median: 

d.  third  quartile: 

e.  maximum  value: 

f.  width  of  IQR: 

g.  overall  shape: 

2.  Based  on  your  comments  in  the  section  titled  "Collect  the  Data",  how  does  the  box  plot  fit  or  not  fit 
what  you  would  expect  of  the  distribution  in  the  section  titled  "Theoretical  Distribution?" 


5.10.8  Discussion  Question 

1.  Suppose  that  the  number  of  values  generated  was  500,  not  50.  How  would  that  affect  what  you  woiild 
expect  the  empirical  data  to  be  and  the  shape  of  its  graph  to  look  like? 
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Solutions  to  Exercises  in  Chapter  5 

Solution  to  Example  5.5,  Problem  1  (p.  229) 

0.5714 

Solution  to  Example  5.5,  Problem  2  (p.  229) 

4 
5 

Solution  to  Example  5.10,  Problem  (p.  238) 

•  fi  =  12 

•  (7=12 

P{x  >  5)  =  0.6592 


Solutions  to  Practice  1:  Uniform  Distribution 

Solution  to  Exercise  5.6.1  (p.  240) 

The  age  of  cars  in  the  staff  parking  lot 
Solution  to  Exercise  5.6.2  (p.  240) 

X  =  The  age  (in  years)  of  cars  in  the  staff  parking  lot 
Solution  to  Exercise  5.6.3  (p.  240) 

Continuous 

Solution  to  Exercise  5.6.4  (p.  240) 

0.5-9.5 

Solution  to  Exercise  5.6.5  (p.  240) 

X~  17(0.5,9.5) 

Solution  to  Exercise  5.6.6  (p.  240) 

/w  =1 

Solution  to  Exercise  5.6.7  (p.  240) 

b.i.  0.5 
b.ii.  9.5 
b.iii.  5 

b.iv.  Age  of  Cars 
b.v.  /  (x) 

Solution  to  Exercise  5.6.8  (p.  241) 
u.  33 

D..  g 

Solution  to  Exercise  5.6.9  (p.  241) 
b:  ^ 

Solution  to  Exercise  5.6.11  (p.  242) 

^  =  5 

Solution  to  Exercise  5.6.12  (p.  242) 
b.  Jt  =  7.25 
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Solutions  to  Practice  2:  Exponential  Distribution 

Solution  to  Exercise  5.7.2  (p.  243) 
Continuous 

Solution  to  Exercise  5.7.3  (p.  243) 

X  =  Time  (years)  to  decay  carbon-14 
Solution  to  Exercise  5.7.4  (p.  243) 

m  =  0.000121 

Solution  to  Exercise  5.7.5  (p.  243) 

X  ~  Exp(0.000121) 

Solution  to  Exercise  5.7.6  (p.  243) 

h.  P{x<  5730)  =  0.5001 

Solution  to  Exercise  5.7.7  (p.  244) 

h.  P{x>  10000)  =  0.2982 
Solution  to  Exercise  5.7.8  (p.  244) 

b.  k  =  2947.73 

Solutions  to  Homework 
Solution  to  Exercise  5.8.3  (p.  245) 

a.  X~!J(1,53) 

c.  f  (x)  —  ^  where  1  <  x  <53 

d.  27 

e.  15.01 

f.  0 

S-  52 

h  15 

52 

i  ^ 

1.  27 

j.  37.4 
k.  40 

Solution  to  Exercise  5.8.5  (p.  245) 

b.  X~!i  (25,45) 

d.  uniform;  continuous 

e.  35  minutes 

f.  5.8  minutes 

g.  0.25 

h.  0.5 

i.  1 

j.  43  minutes 
k.  40  minutes 
1.  0.3333 

Solution  to  Exercise  5.8.7  (p.  246) 

b.  X~!J(0,8) 

d.  /  (x)  =  1  where  0  <  x  <  8 

e.  4 
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f.  2.31 

8-^ 

h.  1 

i.  3.2 

Solution  to  Exercise  5.8.9  (p.  247) 

a.  0.1 

b.  10 

d.  0.4512 

e.  0.1920 

f.  0.4966 

g.  5.11 

h.  10 

Solution  to  Exercise  5.8.11  (p.  247) 

c.  X~Exp  (0.025) 

d.  40  months 

e.  360  months 

f.  0.4066 

g.  14.27 

Solution  to  Exercise  5.8.13  (p.  248) 

c.  X~Exp  (i) 

d.  5 

e.  5 

g.  0.1353 

h.  Before 

i.  18.3 

Solution  to  Exercise  5.8.15  (p.  249) 

A 

Solution  to  Exercise  5.8.16  (p.  249) 

B 

Solution  to  Exercise  5.8.17  (p.  249) 

C 

Solution  to  Exercise  5.8.18  (p.  249) 
D 

Solution  to  Exercise  5.8.19  (p.  249) 

B 

Solution  to  Exercise  5.8.20  (p.  249) 
B 

Solutions  to  Review 
Solution  to  Exercise  5.9.1  (p.  251) 

a.  24% 

b.  27% 

Solution  to  Exercise  5.9.2  (p.  251) 

Qualitative 
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Solution  to  Exercise  5.9.3  (p.  251) 
0.36 

Solution  to  Exercise  5.9.4  (p.  251) 

0.7636 

Solution  to  Exercise  5.9.5  (p.  251) 

a.  No, 

b.  No, 

Solution  to  Exercise  5.9.6  (p.  251) 

5(10,0.76) 

Solution  to  Exercise  5.9.7  (p.  251) 
0.9330 

Solution  to  Exercise  5.9.8  (p.  251) 

a.  X  =  the  number  of  questions  posted  to  the  Statistics  Listserv  per  day 

b.  X  =  0,1,2,... 

c.  X~P(2) 

d.  0 

Solution  to  Exercise  5.9.9  (p.  251) 

$150 

Solution  to  Exercise  5.9.10  (p.  252) 

Matt 

Solution  to  Exercise  5.9.11  (p.  252) 

a.  False 

b.  True 

c.  False 

d.  False 

Solution  to  Exercise  5.9.12  (p.  252) 

16 

Solution  to  Exercise  5.9.13  (p.  252) 

2,2,3 

Solution  to  Exercise  5.9.14  (p.  253) 

10  ^-^ 

Solution  to  Exercise  5.9.15  (p.  253) 

7_ 
15 

Solution  to  Exercise  5.9.16  (p.  253) 

2_ 
15 
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The  Normal  Distribution 


6.1  The  Normal  Distribution^ 

6.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Recognize  the  normal  probability  distribution  and  apply  it  appropriately 

•  Recognize  the  standard  normal  probability  distribution  and  apply  it  appropriately. 

•  Compare  normal  probabilities  by  converting  to  the  standard  normal  distribution. 


6.1.2  Introduction 

The  normal,  a  continuous  distribution,  is  the  most  important  of  all  the  distributions.  It  is  widely  used 
and  even  more  widely  abused.  Its  graph  is  bell-shaped.  You  see  the  bell  curve  in  almost  all  disciplines. 
Some  of  these  include  psychology,  business,  economics,  the  sciences,  nursing,  and,  of  course,  mathematics. 
Some  of  your  instructors  may  use  the  normal  distribution  to  help  determine  your  grade.  Most  IQ  scores  are 
normally  distributed.  Often  real  estate  prices  fit  a  normal  distribution.  The  normal  distribution  is  extremely 
important  but  it  cannot  be  applied  to  everything  in  the  real  world. 

In  this  chapter,  you  will  study  the  normal  distribution,  the  standard  normal,  and  applications  associated 
with  them. 

6.1.3  Optional  Collaborative  Classroom  Activity 

Your  instructor  will  record  the  heights  of  both  men  and  women  in  your  class,  separately.  Draw  histograms 
of  your  data.  Then  draw  a  smooth  curve  through  each  histogram.  Is  each  curve  somewhat  bell-shaped?  Do 
you  think  that  if  you  had  recorded  200  data  values  for  men  and  200  for  women  that  the  curves  would  look 
bell-shaped?  Calculate  the  mean  for  each  data  set.  Write  the  means  on  the  x-axis  of  the  appropriate  graph 
below  the  peak.  Shade  the  approximate  area  that  represents  the  probability  that  one  randomly  chosen 
male  is  taller  than  72  inches.  Shade  the  approximate  area  that  represents  the  probability  that  one  randomly 
chosen  female  is  shorter  than  60  inches.  If  the  total  area  under  each  curve  is  one,  does  either  probability 
appear  to  be  more  than  0.5? 


^This  content  is  available  online  at  <http:/ / cnx.org/ content/ ml6979/1.12/>. 
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The  normal  distribution  has  two  parameters  (two  numerical  descriptive  measures),  the  mean  and  the 
standard  deviation  (a).  If  X  is  a  quantity  to  be  measured  that  has  a  normal  distribution  with  mean  (ff)  and 
the  standard  deviation  {a),  we  designate  this  by  writing 


NORMAL:X~N  a) 


The  probability  density  function  is  a  rather  complicated  fimction.  Do  not  memorize  it.  It  is  not  necessary. 

f(x)  =  -^-e-ri'-^f 

The  cumulative  distribution  fimction  is  P  (X  <  x)  .  It  is  calculated  either  by  a  calculator  or  a  computer  or 
it  is  looked  up  in  a  table.  Technology  has  made  the  tables  basically  obsolete.  For  that  reason,  as  well  as 
the  fact  that  there  are  various  table  formats,  we  are  not  including  table  instructions  in  this  chapter.  See  the 
NOTE  in  this  chapter  in  Calculation  of  Probabilities. 

The  curve  is  symmetrical  about  a  vertical  line  drawn  through  the  mean,  jj..  In  theory,  the  mean  is  the  same 
as  the  median  since  the  graph  is  symmetric  about  ji.  As  the  notation  indicates,  the  normal  distribution 
depends  only  on  the  mean  and  the  standard  deviation.  Since  the  area  under  the  curve  must  equal  one,  a 
change  in  the  standard  deviation,  a,  causes  a  change  in  the  shape  of  the  curve;  the  curve  becomes  fatter  or 
skinnier  depending  on  a.  A  change  in  ]i  causes  the  graph  to  shift  to  the  left  or  right.  This  means  there  are  an 
infinite  number  of  normal  probability  distributions.  One  of  special  interest  is  called  the  standard  normal 
distribution. 


6.2  The  Standard  Normal  Distribution^ 

The  standard  normal  distribution  is  a  normal  distribution  of  standardized  values  called  z-scores.  A  z- 
score  is  measured  in  units  of  the  standard  deviation.  For  example,  if  the  mean  of  a  normal  distribution  is 
5  and  the  standard  deviation  is  2,  the  value  11  is  3  standard  deviations  above  (or  to  the  right  of)  the  mean. 
The  calculation  is: 

X  =  ji  +  (z)(7  =  5  +  (3)  (2)  =  11  (6.1) 

The  z-score  is  3. 

The  mean  for  the  standard  normal  distribution  is  0  and  the  standard  deviation  is  1.  The  transformation 

z  =  produces  the  distribution        N  (0, 1)       .  The  value  x  comes  from  a  normal  distribution  with 

mean  }i  and  standard  deviation  a. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6986/l.7/>. 
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6.3  Z-scores^ 

If  X  is  a  normally  distributed  random  variable  and  X~N      cr),  then  the  z-score  is: 

The  z-score  tells  you  how  many  standard  deviations  that  the  value  x  is  above  (to  the  right  of)  or  below 
(to  the  left  of)  the  mean,  ji.  Values  of  x  that  are  larger  than  the  mean  have  positive  z-scores  and  values  of  x 
that  are  smaller  than  the  mean  have  negative  z-scores.  If  x  equals  the  mean,  then  x  has  a  z-score  of  0. 

Example  6.1 

Suppose  X  ■~  N  (5,  6).  This  says  that  X  is  a  normally  distributed  random  variable  with  mean 
H  —  5  and  standard  deviation  cr  —  6.  Suppose  x  —  17.  Then: 

x-u     17-5  ^ 
z  =  — ^  =  ——  -  2  6.3 
cr  6 

This  means  that  X  =  17  is  2  standard  deviations  (2(r)  above  or  to  the  right  of  the  mean     —  5. 

The  standard  deviation  is  c  —  6. 

Notice  that: 

5  +  2  •  6  =  17      (The  pattern  is  }i  +  zcr^  x.)  (6.4) 
Now  suppose  X  —  1.  Then: 


X  —  }i     1  —  5 


-0.67        (rounded  to  two  decimal  places)  (6.5) 


a  6 

This  means  that  x  =  1  is  0.67  standard  deviations  (—  0.67cr)  below  or  to  the  left  of  the  mean 
H  —  5.  Notice  that: 

5  +  (—0.67)  (6)  is  approximately  equal  to  1         (This  has  the  pattern  }i  +  (—0.67)  (7  =  1) 

Summarizing,  when  z  is  positive,  x  is  above  or  to  the  right  of  ^  and  when  z  is  negative,  x  is  to  the 
left  of  or  below  }i. 

Example  6.2 

Some  doctors  believe  that  a  person  can  lose  5  pounds,  on  the  average,  in  a  month  by  reducing 
his/her  fat  intake  and  by  exercising  consistently.  Suppose  weight  loss  has  a  normal  distribution. 
Let  X  =  the  amoimt  of  weight  lost  (in  pounds)  by  a  person  in  a  month.  Use  a  standard  deviation 
of  2  pounds.  X~N  (5,  2).  Fill  in  the  blanks. 

Problem  1  (Solution  on  p.  285.) 

Suppose  a  person  lost  10  pounds  in  a  month.  The  z-score  when  x  —  10  pounds  is  z  =  2.5 

(verify).  This  z-score  tells  you  that  x  —  10  is  standard  deviations  to  the  (right 

or  left)  of  the  mean  (What  is  the  mean?). 

Problem  2  (Solution  on  p.  285.) 

Suppose  a  person  gained  3  pounds  (a  negative  weight  loss).  Then  z  =  .  This  z-score 

teUs  you  that  x  —  —3  is  standard  deviations  to  the  (right  or  left)  of  the  mean. 

Suppose  the  random  variables  X  and  Y  have  the  following  normal  distributions:  X  --^N  (5,  6)  and 
y  ^  N(2,  l).lfx  —  17,  then  z  =  2.  (This  was  previously  shown.)  If  i/  =  4,  what  is  z? 

y  —  u     4  —  2 

z  =  =         =  2      where  ^=2  and  cr=l.  (6.6) 


^This  content  is  available  online  at  <http://caTx.org/content/ml6991/1.10/>. 
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The  z-score  for  y  =  4  is  z  =  2.  This  means  that  4  is  z  =  2  standard  deviations  to  the  right  of 
the  mean.  Therefore,  x  =  17  and  y  =  4  are  both  2  (of  their)  standard  deviations  to  the  right  of 
their  respective  means. 

The  z-score  allows  us  to  compare  data  that  are  scaled  differently.  To  understand  the  concept, 
suppose  X  r^N  {5,  6)  represents  weight  gains  for  one  group  of  people  who  are  trying  to  gain 
weight  in  a  6  week  period  and  Y  ^^N  {2,  1)  measures  the  same  weight  gain  for  a  second  group 
of  people.  A  negative  weight  gain  would  be  a  weight  loss.  Since  x  =  17  and  y  =  4  are  each  2 
standard  deviations  to  the  right  of  their  means,  they  represent  the  same  weight  gain  relative  to 
their  means. 

The  Empirical  Rule 

If  X  is  a  random  variable  and  has  a  normal  distribution  with  mean  f/  and  standard  deviation  a  then  the 
Empirical  Rule  says  (See  the  figure  below) 

•  About  68.27%  of  the  x  values  lie  between  -la  and  +1(7  of  the  mean  pi  (within  1  standard  deviation  of 
the  mean). 

•  About  95.45%  of  the  x  values  lie  between  -2a  and  +2a  of  the  mean  j,i  (within  2  standard  deviations  of 
the  mean). 

•  About  99.73%  of  the  x  values  lie  between  -3a  and  +3a  of  the  mean  j,i  (within  3  standard  deviations  of 
the  mean).  Notice  that  almost  all  the  x  values  lie  within  3  standard  deviations  of  the  mean. 

•  The  z-scores  for  +la  and  -la  are  +1  and  -1,  respectively. 

•  The  z-scores  for  +2a  and  -2a  are  +2  and  -2,  respectively 

•  The  z-scores  for  +3a  and  -3a  are  +3  and  -3  respectively. 


The  Empirical  Rule  is  also  known  as  the  68-95-99.7  Rule. 
Example  6.3 

Suppose  X  has  a  normal  distribution  with  mean  50  and  standard  deviation  6. 

•  About  68.27%  of  the  x  values  lie  between  -la  =  (-1)(6)  =  -6  and  la  =  (1)(6)  =  6  of  the  mean  50. 
The  values  50  -  6  =  44  and  50  +  6  =  56  are  within  1  standard  deviation  of  the  mean  50.  The 
z-scores  are  -1  and  +1  for  44  and  56,  respectively. 

•  About  95.45%  of  the  x  values  lie  between  -2a  =  (-2)(6)  =  -12  and  2a  =  (2)(6)  =  12  of  the  mean 
50.  The  values  50  - 12  =  38  and  50  +  12  =  62  are  within  2  standard  deviations  of  the  mean  50. 
The  z-scores  are  -2  and  2  for  38  and  62,  respectively. 

•  About  99.73%  of  the  x  values  lie  between  -3a  =  (-3)(6)  =  -18  and  3a  =  (3)(6)  =  18  of  the  mean 
50.  The  values  50  - 18  =  32  and  50  +  18  =  68  are  within  3  standard  deviations  of  the  mean  50. 
The  z-scores  are  -3  and  +3  for  32  and  68,  respectively. 


X 


—3a— 2a— la  ji    la  2a  3a 
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6.4  Areas  to  the  Left  and  Right  of 

The  arrow  in  the  graph  below  points  to  the  area  to  the  left  of  x.  This  area  is  represented  by  the  probability 
P  {X  <  x).  Normal  tables,  computers,  and  calculators  provide  or  calculate  the  probability  P  {X  <  x). 


X 


The  area  to  the  right  is  then  P  (X  >  x)  =  1  -  P  (X  <  x). 
Remember,  P  (X  <  x)  =  Area  to  the  left  of  the  vertical  line  through  x. 
P(X  >  x)  =  1  — P(X  <  x)  =.  Area  to  the  right  of  the  vertical  line  through  x 

P  (X  <  x)  is  the  same  as  P  (X  <  x)  and  P  (X  >  x)  is  the  same  as  P  (X  >  x)  for  continuous  distributions. 

6.5  Calculations  of  Probabilities^ 

Probabilities  are  calculated  by  using  technology  There  are  instructions  in  the  chapter  for  the  TI-83+  and 
TI-84  calculators. 

NOTE:  In  the  Table  of  Contents  for  Collaborative  Statistics,  entry  15.  Tables  has  a  link  to  a  table 
of  normal  probabilities.  Use  the  probability  tables  if  so  desired,  instead  of  a  calculator.  The  tables 
include  instructions  for  how  to  use  then. 

Example  6.4 

If  the  area  to  the  left  is  0.0228,  then  the  area  to  the  right  is  1  -  0.0228  =  0.9772. 
Example  6.5 

The  final  exam  scores  in  a  statistics  class  were  normally  distributed  with  a  mean  of  63  and  a 
standard  deviation  of  5. 

Problem  1 

Find  the  probability  that  a  randomly  selected  student  scored  more  than  65  on  the  exam. 
Solution 

Let  X  =  a  score  on  the  final  exam.  X~N  (63, 5),  where  f/  =  63  and  cr  =  5 
Draw  a  graph. 
Then,  find  P(x  >  65). 

P  (x  >  65)  =  0.3446  (calculator  or  computer) 

*This  content  is  available  online  at  <http://cnx.Org/content/ml6976/l.5/>. 
^This  content  is  available  online  at  <http://cnx.Org/content/ml6977/l.12/>. 
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63  65 


The  probability  that  one  student  scores  more  than  65  is  0.3446. 

Using  the  TI-83+  or  the  TI-84  calculators,  the  calculation  is  as  follows.  Go  into  2nd  DISTR. 
After  pressing  2nd  DISTR,  press  2 :  normalcdf . 
The  syntax  for  the  instructions  are  shown  below. 

normalcdf(lower  value,  upper  value,  mean,  standard  deviation)  For  this  problem:  normal- 
cdf (65,1E99,63,5)  =  0.3446.  You  get  1E99  (  =  10^^)  by  pressing  1,  the  EE  key  (a  2nd  key)  and  then  99. 
Or,  you  can  enter  10~99  instead.  The  number  10^^  is  way  out  in  the  right  tail  of  the  normal  curve. 
We  are  calculating  the  area  between  65  and  10^^.  In  some  instances,  the  lower  number  of  the  area 
might  be  -1E99  (  =  —10^^).  The  number  —10^^  is  way  out  in  the  left  tail  of  the  normal  curve. 

Historical  Note:  The  TI  probability  program  calculates  a  z-score  and  then  the  probability  from 
the  z-score.  Before  technology,  the  z-score  was  looked  up  in  a  standard  normal  probability  table 
(because  the  math  involved  is  too  cumbersome)  to  find  the  probability.  In  this  example,  a  standard 
normal  table  with  area  to  the  left  of  the  z-score  was  used.  You  calculate  the  z-score  and  look  up 
the  area  to  the  left.  The  probability  is  the  area  to  the  right. 

z  =  =  0.4       .  Area  to  the  left  is  0.6554.  P  (x  >  65)  =  P  (z  >  0.4)  =  1  -  0.6554  =  0.3446 

Problem  2 

Find  the  probability  that  a  randomly  selected  student  scored  less  than  85. 

Solution 

Draw  a  graph. 

Then  find  P  {x  <  85).  Shade  the  graph.  P  {x  <  85)  =  1  (calculator  or  computer) 
The  probability  that  one  student  scores  less  than  85  is  approximately  1  (or  100%). 
The  Tl-instructions  and  answer  are  as  follows: 
normalcdf  (0,85,63,5)  =  1  (rounds  to  1) 

Problem  3 

Find  the  90th  percentile  (that  is,  find  the  score  k  that  has  90  %  of  the  scores  below  k  and  10%  of 
the  scores  above  k). 

Solution 

Find  the  90th  percentile.  For  each  problem  or  part  of  a  problem,  draw  a  new  graph.  Draw  the 
X-axis.  Shade  the  area  that  corresponds  to  the  90th  percentile. 

Let  k  =  the  90th  percentile,  k  is  located  on  the  x-axis.  P  {x  <  k)  is  the  area  to  the  left  of  k.  The  90th 
percentile  k  separates  the  exam  scores  into  those  that  are  the  same  or  lower  than  k  and  those  that 
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are  the  same  or  higher.  Ninety  percent  of  the  test  scores  are  the  same  or  lower  than  k  and  10%  are 
the  same  or  higher,  k  is  often  called  a  critical  value. 

k  =  69 A  (calculator  or  computer) 


The  90th  percentile  is  69.4.  This  means  that  90%  of  the  test  scores  fall  at  or  below  69.4  and  10%  fall 
at  or  above.  For  the  TI-83+  or  TI-84  calculators,  use  invNorm  in  2nd  DISTR.  invNorm(area  to  the 
left,  mean,  standard  deviation)  For  this  problem,  u-ivNorm(0.90,63,5)  =  69.4 

Problem  4 

Find  the  70th  percentile  (that  is,  find  the  score  k  such  that  70%  of  scores  are  below  k  and  30%  of 
the  scores  are  above  k). 

Solution 

Find  the  70th  percentile. 

Draw  a  new  graph  and  label  it  appropriately,  k  =  65.6 

The  70th  percentile  is  65.6.  This  means  that  70%  of  the  test  scores  fall  at  or  below  65.5  and  30%  fall 
at  or  above. 

invNorm(0.70,63,5)  =  65.6 


Example  6.6 

A  computer  is  used  for  office  work  at  home,  research,  communication,  personal  finances,  educa- 
tion, entertainment,  social  networking  and  a  myriad  of  other  things.  Suppose  that  the  average 
number  of  hours  a  household  personal  computer  is  used  for  entertainment  is  2  hours  per  day. 
Assume  the  times  for  entertainment  are  normally  distributed  and  the  standard  deviation  for  the 
times  is  half  an  hour. 

Problem  1 

Find  the  probability  that  a  household  personal  computer  is  used  between  1.8  and  2.75  hours  per 
day. 

Solution 

Let  X  =  the  amount  of  time  (in  hours)  a  household  personal  computer  is  used  for  entertainment. 
x~N  (2,0.5)  where  /;  =  2  and  a  =  0.5. 

Find  P  (1.8  <x  <  2.75). 

The  probability  for  which  you  are  looking  is  the  area  between  x  =  1.8  and  x  = 
2.75.     P  (1.8  <x<  2.75)  =  0.5886 
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1.8  2  2.75  ^ 


normalcdf(1.8,2.75,2,0.5)  =  0.5886 

The  probability  that  a  household  personal  computer  is  used  between  1.8  and  2.75  hours  per  day 
for  entertainment  is  0.5886. 


Problem  2 

Find  the  maximum  number  of  hours  per  day  that  the  bottom  quartile  of  households  use  a  personal 
computer  for  entertainment. 

Solution 

To  find  the  maximum  number  of  hours  per  day  that  the  bottom  quartile  of  households  uses  a 
personal  computer  for  entertainment,  find  the  25th  percentile,  k,  where  P  {x  <  k)  =  0.25. 


invNorm(0.25,2,.5)  =  1.66 

The  maximum  number  of  hours  per  day  that  the  bottom  quartile  of  households  uses  a  personal 
computer  for  entertainment  is  1.66  hours. 
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6.6  Summary  of  Formulas^ 

Formula  6.1:  Normal  Probability  Distribution 

X~N(^,cr) 

p  =  the  mean       a  =  the  standard  deviation 

Formula  6.2:  Standard  Normal  Probability  Distribution 
Z~A[  (0,1) 

z  =  a  standardized  value  (z-score) 

mean  =  0        standard  deviation  =  1 

Formula  6.3:  Finding  the  kth  Percentile 

To  find  the  kth  percentile  when  the  z-score  is  known:  k 

Formula  6.4:  z-score 

Formula  6.5:  Finding  the  area  to  the  left 
The  area  to  the  left:  P{X<x) 

Formula  6.6:  Finding  the  area  to  the  right 

The  area  to  the  right:  P  (X  >  x)     1  -  P  (X  <  x) 
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6.7  Practice:  The  Normal  Distribution^ 

6.7.1  Student  Learning  Outcomes 

•  The  student  will  analyze  data  following  a  normal  distribution. 

6.7.2  Given 

The  life  of  Sunshine  CD  players  is  normally  distributed  with  a  mean  of  4.1  years  and  a  standard  deviation 
of  1.3  years.  A  CD  player  is  guaranteed  for  3  years.  We  are  interested  in  the  length  of  time  a  CD  player 
lasts. 


6.7.3  Normal  Distribution 
Exercise  6.7.1 

Define  the  Random  Variable  X  in  words.  X  — 

Exercise  6.7.2 

X~ 

Exercise  6.7.3  (Solution  on  p.  285.) 

Find  the  probability  that  a  CD  player  wiU  break  down  during  the  guarantee  period. 

a.  Sketch  the  situation.  Label  and  scale  the  axes.  Shade  the  region  corresponding  to  the  probabil- 
ity. 


Figure  6.1 


b.  P  (0  <  X  <  )  —  (Use  zero  (0)  for  the  minimum  value  of  x.) 

Exercise  6.7.4  (Solution  on  p.  285.) 

Find  the  probability  that  a  CD  player  will  last  between  2.8  and  6  years. 

a.  Sketch  the  situation.  Label  and  scale  the  axes.  Shade  the  region  corresponding  to  the  probabil- 

ity 

^This  content  is  available  onKne  at  <http://cnx.org/content/ml6983/1.10/>. 
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Figure  6.2 


b.  P  (  <x<  )  =  

Exercise  6.7.5  (Solution  on  p.  285.) 

Find  the  70th  percentile  of  the  distribution  for  the  time  a  CD  player  lasts. 

a.  Sketch  the  situation.  Label  and  scale  the  axes.  Shade  the  region  corresponding  to  the  lower 
70%. 


Figure  6.3 


b.  P  {x  <  k)  —  .  Therefore,  k  — 
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6.8  Homework^ 

Exercise  6.8.1  (Solution  on  p.  285.) 

According  to  a  study  done  by  De  Anza  students,  the  height  for  Asian  adult  males  is  normally 
distributed  with  an  average  of  66  inches  and  a  standard  deviation  of  2.5  inches.  Suppose  one 
Asian  adult  male  is  randomly  chosen.  Let  X  ^height  of  the  individual. 

a.  X~  (  ,  ) 

b.  Find  the  probability  that  the  person  is  between  65  and  69  inches.  Include  a  sketch  of  the  graph 

and  write  a  probability  statement. 

c.  Would  you  expect  to  meet  many  Asian  adult  males  over  72  inches?  Explain  why  or  why  not, 

and  justify  your  answer  numerically. 

d.  The  middle  40%  of  heights  fall  between  what  two  values?  Sketch  the  graph  and  write  the 

probability  statement. 


Exercise  6.8.2 

IQ  is  normally  distributed  with  a  mean  of  100  and  a  standard  deviation  of  15.  Suppose  one 
individual  is  randomly  chosen.  Let  X  =IQ  of  an  individual. 

a.  X-  (  ,  ) 

b.  Find  the  probability  that  the  person  has  an  IQ  greater  than  120.  Include  a  sketch  of  the  graph 

and  write  a  probability  statement. 

c.  Mensa  is  an  organization  whose  members  have  the  top  2%  of  all  IQs.  Find  the  minimum  IQ 

needed  to  qualify  for  the  Mensa  organization.  Sketch  the  graph  and  write  the  probability 
statement. 

d.  The  middle  50%  of  IQs  fall  between  what  two  values?  Sketch  the  graph  and  write  the  proba- 

bility statement. 

Exercise  6.8.3  (Solution  on  p.  285.) 

The  percent  of  fat  calories  that  a  person  in  America  consumes  each  day  is  normally  distributed 
with  a  mean  of  about  36  and  a  standard  deviation  of  10.  Suppose  that  one  individual  is  randomly 
chosen.  Let  X  ^percent  of  fat  calories. 

a.  X~  (  ) 

b.  Find  the  probability  that  the  percent  of  fat  calories  a  person  consumes  is  more  than  40.  Graph 

the  situation.  Shade  in  the  area  to  be  determined. 

c.  Find  the  maximum  number  for  the  lower  quarter  of  percent  of  fat  calories.  Sketch  the  graph 

and  write  the  probability  statement. 

Exercise  6.8.4 

Suppose  that  the  distance  of  fly  balls  hit  to  the  outfield  (in  baseball)  is  normally  distributed  with 
a  mean  of  250  feet  and  a  standard  deviation  of  50  feet. 

a.  If  X  =  distance  in  feet  for  a  fly  ball,  then  X--^  (  ,  ) 

b.  If  one  fly  ball  is  randomly  chosen  from  this  distribution,  what  is  the  probability  that  this  ball 

traveled  fewer  than  220  feet?  Sketch  the  graph.  Scale  the  horizontal  axis  X.  Shade  the  region 
corresponding  to  the  probability.  Find  the  probability. 

c.  Find  the  80th  percentile  of  the  distribution  of  fly  balls.  Sketch  the  graph  and  write  the  probabil- 

ity statement. 

*This  content  is  available  online  at  <http://caTx.org/content/ml6978/1.20/>. 
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Exercise  6.8.5  (Solution  on  p.  285.) 

In  China,  4-year-olds  average  3  hours  a  day  unsupervised.  Most  of  the  unsupervised  children  live 
in  rural  areas,  considered  safe.  Suppose  that  the  standard  deviation  is  1.5  hours  and  the  amount 
of  time  spent  alone  is  normally  distributed.  We  randomly  survey  one  Chinese  4-year-old  living  in 
a  rural  area.  We  are  interested  in  the  amount  of  time  the  child  spends  alone  per  day.  (Source:  San 
Jose  Mercury  News) 

a.  In  words,  define  the  random  variable  X.  X  — 

b.  X'~ 

c.  Find  the  probability  that  the  child  spends  less  than  1  hour  per  day  unsupervised.  Sketch  the 

graph  and  write  the  probability  statement. 

d.  What  percent  of  the  children  spend  over  10  hours  per  day  unsupervised? 

e.  70%  of  the  children  spend  at  least  how  long  per  day  unsupervised? 

Exercise  6.8.6 

In  the  1992  presidential  election,  Alaska's  40  election  districts  averaged  1956.8  votes  per  district 
for  President  Clinton.  The  standard  deviation  was  572.3.  (There  are  only  40  election  districts  in 
Alaska.)  The  distribution  of  the  votes  per  district  for  President  Clinton  was  bell-shaped.  Let  X  = 
number  of  votes  for  President  Clinton  for  an  election  district.  (Source:  The  World  Almanac  and 
Book  of  Facts) 

a.  State  the  approximate  distribution  of  X.  X~ 

b.  Is  1956.8  a  population  mean  or  a  sample  mean?  How  do  you  know? 

c.  Find  the  probability  that  a  randomly  selected  district  had  fewer  than  1600  votes  for  President 

Clinton.  Sketch  the  graph  and  write  the  probability  statement. 

d.  Find  the  probability  that  a  randomly  selected  district  had  between  1800  and  2000  votes  for 

President  Clinton. 

e.  Find  the  third  quartile  for  votes  for  President  Clinton. 

Exercise  6.8.7  (Solution  on  p.  285.) 

Suppose  that  the  duration  of  a  particular  type  of  criminal  trial  is  known  to  be  normally  distributed 
with  a  mean  of  21  days  and  a  standard  deviation  of  7  days. 

a.  In  words,  define  the  random  variable  X.  X  = 

b.  X'~ 

c.  If  one  of  the  trials  is  randomly  chosen,  find  the  probability  that  it  lasted  at  least  24  days.  Sketch 

the  graph  and  write  the  probability  statement. 

d.  60%  of  all  of  these  types  of  trials  are  completed  within  how  many  days? 

Exercise  6.8.8 

Terri  Vogel,  an  amateur  motorcycle  racer,  averages  129.71  seconds  per  2.5  mile  lap  (in  a  7  lap 
race)  with  a  standard  deviation  of  2.28  seconds  .  The  distribution  of  her  race  times  is  normally 
distributed.  We  are  interested  in  one  of  her  randomly  selected  laps.  (Source:  log  book  of  Terri 
Vogel) 

a.  In  words,  define  the  random  variable  X.  X  = 

b.  X'~ 

c.  Find  the  percent  of  her  laps  that  are  completed  in  less  than  130  seconds. 

d.  The  fastest  3%  of  her  laps  are  imder  . 

e.  The  middle  80%  of  her  laps  are  from  seconds  to  seconds. 
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Exercise  6.8.9  (Solution  on  p.  285.) 

Thuy  Dau,  Ngoc  Bui,  Sam  Su,  and  Lan  Voimg  conducted  a  survey  as  to  how  long  customers  at 
Lucky  claimed  to  wait  in  the  checkout  line  imtil  their  turn.  Let  X  =time  in  line.  Below  are  the 
ordered  real  data  (in  minutes): 


0.50 

4.25 

5 

6 

7.25 

1.75 

4.25 

5.25 

6 

7.25 

2 

4.25 

5.25 

6.25 

7.25 

2.25 

4.25 

5.5 

6.25 

7.75 

2.25 

4.5 

5.5 

6.5 

8 

2.5 

4.75 

5.5 

6.5 

8.25 

2.75 

4.75 

5.75 

6.5 

9.5 

3.25 

4.75 

5.75 

6.75 

9.5 

3.75 

5 

6 

6.75 

9.75 

3.75 

5 

6 

6.75 

10.75 

Table  6.1 


a.  Calculate  the  sample  mean  and  the  sample  standard  deviation. 

b.  Construct  a  histogram.  Start  the  x  —  axis  at  —0.375  and  make  bar  widths  of  2  minutes. 

c.  Draw  a  smooth  curve  through  the  midpoints  of  the  tops  of  the  bars. 

d.  In  words,  describe  the  shape  of  youi  histogram  and  smooth  curve. 

e.  Let  the  sample  mean  approximate  ^  and  the  sample  standard  deviation  approximate  cr.  The 

distribution  of  X  can  then  be  approximated  by  X~ 

f.  Use  the  distribution  in  (e)  to  calculate  the  probability  that  a  person  wUl  wait  fewer  than  6.1 

minutes. 

g.  Determine  the  cumulative  relative  frequency  for  waiting  less  than  6.1  minutes. 

h.  Why  aren't  the  answers  to  (f)  and  (g)  exactly  the  same? 

i.  Why  are  the  answers  to  (f)  and  (g)  as  close  as  they  are? 

j.  If  only  10  customers  were  surveyed  instead  of  50,  do  you  think  the  answers  to  (f)  and  (g)  would 
have  been  closer  together  or  farther  apart?  Explain  yoiur  conclusion. 

Exercise  6.8.10 

Suppose  that  Ricardo  and  Anita  attend  different  colleges.  Ricardo's  CPA  is  the  same  as  the  av- 
erage GPA  at  his  school.  Anita's  GPA  is  0.70  standard  deviations  above  her  school  average.  In 
complete  sentences,  explain  why  each  of  the  following  statements  may  be  false. 

a.  Ricardo's  actual  GPA  is  lower  than  Anita's  actual  GPA. 

b.  Ricardo  is  not  passing  since  his  z-score  is  zero. 

c.  Anita  is  in  the  70th  percentile  of  students  at  her  college. 

Exercise  6.8.11  (Solution  on  p.  286.) 

Below  is  a  sample  of  the  maximum  capacity  (maximum  number  of  spectators)  of  sports 
stadiums.  The  table  does  not  include  horse  racing  or  motor  racing  stadiums.  (Source: 
http://en.wikipedia.org/wiki/List_of_stadiums_by_capacity) 
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40,000 

40,000 

45,050 

45,500 

46,249 

48,134 

49,133 

50,071 

50,096 

50,466 

50,832 

51,100 

51,500 

51,900 

52,000 

52,132 

52,200 

52,530 

52,692 

53,864 

54,000 

55,000 

55,000 

55,000 

55,000 

55,000 

55,000 

55,082 

57,000 

58,008 

59,680 

60,000 

60,000 

60,492 

60,580 

62,380 

62,872 

64,035 

65,000 

65,050 

65,647 

66,000 

66,161 

67,428 

68,349 

68,976 

69,372 

70,107 

70,585 

71,594 

72,000 

72,922 

73,379 

74,500 

75,025 

76,212 

78,000 

80,000 

80,000 

82,300 

Table  6.2 


a.  Calculate  the  sample  mean  and  the  sample  standard  deviation  for  the  maximum  capacity  of 

sports  stadiums  (the  data). 

b.  Construct  a  histogram  of  the  data. 

c.  Draw  a  smooth  curve  through  the  midpoints  of  the  tops  of  the  bars  of  the  histogram. 

d.  In  words,  describe  the  shape  of  your  histogram  and  smooth  curve. 

e.  Let  the  sample  mean  approximate  }i  and  the  sample  standard  deviation  approximate  cr.  The 

distribution  of  X  can  then  be  approximated  by  X~ 

f.  Use  the  distribution  in  (e)  to  calculate  the  probability  that  the  maximum  capacity  of  sports 

stadiums  is  less  than  67,000  spectators. 

g.  Determine  the  cumulative  relative  frequency  that  the  maximum  capacity  of  sports  stadiums  is 

less  than  67,000  spectators.  Hint:  Order  the  data  and  count  the  sports  stadiirais  that  have  a 
maximum  capacity  less  than  67,000.  Divide  by  the  total  number  of  sports  stadiums  in  the 
sample. 

h.  Why  aren't  the  answers  to  (f)  and  (g)  exactly  the  same? 


6.8.1  Try  These  Multiple  Choice  Questions 

The  questions  below  refer  to  the  following:  The  patient  recovery  time  from  a  particular  surgical  proce- 
dure is  normally  distributed  with  a  mean  of  5.3  days  and  a  standard  deviation  of  2.1  days. 

Exercise  6.8.12  (Solution  on  p.  286.) 

What  is  the  median  recovery  time? 

A.  2.7 

B.  5.3 

C.  7.4 

D.  2.1 


Exercise  6.8.13  (Solution  on  p.  286.) 

What  is  the  z-score  for  a  patient  who  takes  10  days  to  recover? 

A.  1.5 

B.  0.2 
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C.  2.2 

D.  7.3 


Exercise  6.8.14  (Solution  on  p.  286.) 

What  is  the  probability  of  spending  more  than  2  days  in  recovery? 

A.  0.0580 

B.  0.8447 

C.  0.0553 

D.  0.9420 


Exercise  6.8.15  (Solution  on  p.  286.) 

The  90th  percentile  for  recovery  times  is? 

A.  8.89 

B.  7.07 

C.  7.99 

D.  4.32 


The  questions  below  refer  to  the  following:  The  length  of  time  to  find  a  parking  space  at  9  A.M.  follows  a 
normal  distribution  with  a  mean  of  5  minutes  and  a  standard  deviation  of  2  minutes. 

Exercise  6.8.16  (Solution  on  p.  286.) 

Based  upon  the  above  information  and  numerically  justified,  would  you  be  surprised  if  it  took 
less  than  1  minute  to  find  a  parking  space? 

A.  Yes 

B.  No 

C.  Unable  to  determine 


Exercise  6.8.17  (Solution  on  p.  286.) 

Find  the  probability  that  it  takes  at  least  8  minutes  to  find  a  parking  space. 

A.  0.0001 

B.  0.9270 

C.  0.1862 

D.  0.0668 


Exercise  6.8.18  (Solution  on  p.  286.) 

Seventy  percent  of  the  time,  it  takes  more  than  how  many  minutes  to  find  a  parking  space? 

A.  1.24 

B.  2.41 

C.  3.95 

D.  6.05 


Exercise  6.8.19  (Solution  on  p.  286.) 

If  the  mean  is  significantly  greater  than  the  standard  deviation,  which  of  the  following  statements 
is  true? 

I .  The  data  cannot  follow  the  imiform  distribution. 

II .  The  data  cannot  follow  the  exponential  distribution.. 

III .  The  data  cannot  follow  the  normal  distribution. 
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A.  I  only 

B.  II  only 

C.  Ill  only 

D.  I,  II,  and  III 
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6.9  Review"^ 

The  next  two  questions  refer  to:  X  ~  LI  (3, 13) 

Exercise  6.9.1  (Solution  on  p.  286.) 

Explain  which  of  the  following  are  false  and  which  are  true. 

a:  /  (^)  =         <  X  <  13 

b:  There  is  no  mode. 

c:  The  median  is  less  than  the  mean. 

d:  P{x>  10)  =P{x<6) 

Exercise  6.9.2  (Solution  on  p.  286.) 

Calculate: 

a:  Mean 

b:  Median 

c:  65th  percentile. 


0  2  4  7 

Exercise  6.9.3 

Which  of  the  following  is  true  for  the  above  box  plot? 
a:  25%  of  the  data  are  at  most  5. 

b:  There  is  about  the  same  amount  of  data  from  4  -  5  as  there  is  from  5-7. 
c:  There  are  no  data  values  of  3. 
d:  50%  of  the  data  are  4. 

Exercise  6.9.4 

IfP(G  I  H)  =  P  (G),  then  which  of  the  following  is  correct? 

A:  G  and  H  are  mutually  exclusive  events. 
B:  P(G)  =  P{H) 

C:  Knowing  that  H  has  occurred  will  affect  the  chance  that  G  will  happen. 
D:  G  and  H  are  independent  events. 


(Solution  on  p.  286.) 


(Solution  on  p.  286.) 


Exercise  6.9.5  (Solution  on  p.  286.) 

If  P  (/)  =  0.3,  P  (K)  =  0.6,  and  /  and  K  are  independent  events,  then  explain  which  are  correct 
and  which  are  incorrect. 

A:  P  (JandK)  =  0 
B:  P  (JorK)  =  0.9 
C:  P  (JorK)  =  0.72 
D:  P{J)^P{J\K) 


'This  content  is  available  online  at  <http://cnx.Org/content/ml6985/l.10/>. 
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Exercise  6.9.6  (Solution  on  p.  287.) 

On  average,  5  students  from  each  high  school  class  get  full  scholarships  to  4-year  colleges.  Assiraie 
that  most  high  school  classes  have  about  500  students. 

X  =  the  number  of  students  from  a  high  school  class  that  get  full  scholarships  to  4-year  school. 
Which  of  the  following  is  the  distribution  of  X? 

A.  P(5) 

B.  B(500,5) 

C.  Exp(l/5) 

D.  N(5,  (0.01)(0.99)/500) 
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6.10  Lab  1:  Normal  Distribution  (Lap  Times)^° 

Class  Time: 
Names: 

6.10.1  Student  Learning  Outcome: 

•  The  student  will  compare  and  contrast  empirical  data  and  a  theoretical  distribution  to  determine  if 
Terry  Vogel's  lap  times  fit  a  continuous  distribution. 

6.10.2  Directions: 

Round  the  relative  frequencies  and  probabilities  to  4  decimal  places.  Carry  all  other  decimal  answers  to  2 
places. 

6.10.3  Collect  the  Data 

1.  Use  the  data  from  Terri  Vogel's  Log  Book  (Section  14.3.1:  Lap  Times).  Use  a  Stratified  Sampling 
Method  by  Lap  (Races  1  -  20)  and  a  random  number  generator  to  pick  6  lap  times  from  each  stratum. 
Record  the  lap  times  below  for  Laps  2-7. 


Table  6.3 

2.  Construct  a  histogram.  Make  5-6  intervals.  Sketch  the  graph  using  a  ruler  and  pencil.  Scale  the  axes. 
^"^This  content  is  available  online  at  <http://caTx.org/content/ml6981/1.18/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


281 


Frequency 


Lap  Time 


Figure  6.4 


3.  Calculate  the  following. 

a.  X  = 

b.  s  = 

4.  Draw  a  smooth  curve  through  the  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve.  (Keep  it  simple.  Does  the  graph  go  straight  across,  does  it 
have  a  V-shape,  does  it  have  a  hump  in  the  middle  or  at  either  end,  etc.?) 

6.10.4  Analyze  the  Distribution 

Using  your  sample  mean,  sample  standard  deviation,  and  histogram  to  help,  what  was  the  approximate 
theoretical  distribution  of  the  data? 

•  X~ 

•  How  does  the  histogram  help  you  arrive  at  the  approximate  distribution? 

6.10.5  Describe  the  Data 

Use  the  Data  from  the  section  titled  "Collect  the  Data"  to  complete  the  following  statements. 

•  The  IQR  goes  from  to  . 

•  IQR  =  .  (IQR=Q3-Q1) 

•  The  15th  percentile  is: 

•  The  85th  percentile  is: 

•  The  median  is: 

•  The  empirical  probability  that  a  randomly  chosen  lap  time  is  more  than  130  seconds  = 

•  Explain  the  meaning  of  the  85th  percentile  of  this  data. 
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6.10.6  Theoretical  Distribution 

Using  the  theoretical  distribution  from  the  section  titled  "Analyse  the  Distribution"  complete  the  following 
statements: 

•  The  IQR  goes  from  to  . 

•  1QR  = 

•  The  15th  percentile  is: 

•  The  85th  percentile  is: 

•  The  median  is: 

•  The  probability  that  a  randomly  chosen  lap  time  is  more  than  130  seconds  = 

•  Explain  the  meaning  of  the  85th  percentile  of  this  distribution. 


6.10.7  Discussion  Questions 

•  Do  the  data  from  the  section  titled  "Collect  the  Data"  give  a  close  approximation  to  the  theoretical 
distibution  in  the  section  titled  "Analyze  the  Distribution"?  In  complete  sentences  and  comparing  the 
result  in  the  sections  titled  "Describe  the  Data"  and  "Theoretical  Distribution",  explain  why  or  why 
not. 
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6.11  Lab  2:  Normal  Distribution  (Pinkie  Length)" 

Class  Time: 
Names: 

6.11.1  Student  Learning  Outcomes: 

•  The  student  will  compare  empirical  data  and  a  theoretical  distribution  to  determine  if  data  from  the 
experiment  follow  a  continuous  distribution. 

6.11.2  Collect  the  Data 

Measure  the  length  of  your  pinkie  finger  (in  cm.) 

1.  Randomly  survey  30  adults.  Round  to  the  nearest  0.5  cm. 


Table  6.4 

2.  Construct  a  histogram.  Make  5-6  intervals.  Sketch  the  graph  using  a  ruler  and  pencil.  Scale  the  axes. 


Frequency 


Length  of  Finger 


3.  Calculate  the  Following 


This  content  is  available  online  at  <http:/ /cnx.org/content/ml6980/1.16/>. 
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a.  X  — 
h.  s  = 

4.  Draw  a  smooth  curve  through  the  top  of  the  bars  of  the  histogram.  Use  1-2  complete  senterices  to 
describe  the  general  shape  of  the  curve.  (Keep  it  simple.  Does  the  graph  go  straight  across,  does  it 
have  a  V-shape,  does  it  have  a  hump  in  the  middle  or  at  either  end,  etc.?) 


6.11.3  Analyze  the  Distribution 

Using  youj  sample  mean,  sample  standard  deviation,  and  histogram  to  help,  what  was  the  approximate 
theoretical  distribution  of  the  data  from  the  section  titled  "Collect  the  Data"? 

•  X~ 

•  How  does  the  histogram  help  you  arrive  at  the  approximate  distribution? 

6.11.4  Describe  the  Data 

Using  the  data  in  the  section  titled  "Collect  the  Data"  complete  the  following  statements.  (Hint:  order  the 
data) 

Remember:  {IQR  =  Q3  -  Ql) 

•  IQR  = 

•  15th  percentile  is: 

•  85th  percentile  is: 

•  Median  is: 

•  What  is  the  empirical  probability  that  a  randomly  chosen  pinkie  length  is  more  than  6.5  cm? 

•  Explain  the  meaning  the  85th  percentile  of  this  data. 


6.11.5  Theoretical  Distribution 

Using  the  Theoretical  Distribution  in  the  section  titled  "Analyze  the  Distribution" 

•  IQR  = 

•  15th  percentile  is: 

•  85th  percentile  is: 

•  Median  is: 

•  What  is  the  theoretical  probability  that  a  randomly  chosen  pinkie  length  is  more  than  6.5  cm? 

•  Explain  the  meaning  of  the  85th  percentile  of  this  data. 


6.11.6  Discussion  Questions 

•  Do  the  data  from  the  section  entitled  "Collect  the  Data"  give  a  close  approximation  to  the  theoretical 

distribution  in  "Analyze  the  Distribution."  In  complete  sentences  and  comparing  the  results  in  the 
sections  titled  "Describe  the  Data"  and  "Theoretical  Distribution",  explain  why  or  why  not. 
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Solutions  to  Exercises  in  Chapter  6 

Solution  to  Example  6.2,  Problem  1  (p.  263) 

This  z-score  tells  you  that  x  —  10  is  2.5  standard  deviations  to  the  right  of  the  mean  5. 
Solution  to  Example  6.2,  Problem  2  (p.  263) 

z  =  -4.  This  z-score  tells  you  that  x  —  —3  is  4  standard  deviations  to  the  lejft  of  the  mean. 

Solutions  to  Practice:  The  Normal  Distribution 
Solution  to  Exercise  6.7.3  (p.  270) 

b.  3,0.1979 

Solution  to  Exercise  6.7.4  (p.  270) 
b.  2.8,6,0.7694 

Solution  to  Exercise  6.7.5  (p.  271) 

b.  0.70,4.78years 

Solutions  to  Homework 
Solution  to  Exercise  6.8.1  (p.  272) 

a.  N  (66, 2.5) 

b.  0.5404 

c.  No 

d.  Between  64.7  and  67.3  inches 
Solution  to  Exercise  6.8.3  (p.  272) 

a.  N  (36,10) 

b.  0.3446 

c.  29.3 

Solution  to  Exercise  6.8.5  (p.  273) 

a.  the  time  (in  hours)  a  4-year-old  in  China  spends  unsupervised  per  day 

b.  N(3,1.5) 

c.  0.0912 

d.  0 

e.  2.21  hours 

Solution  to  Exercise  6.8.7  (p.  273) 

a.  The  duration  of  a  criminal  trial 

b.  N(21,7) 

c.  0.3341 

d.  22.77 

Solution  to  Exercise  6.8.9  (p.  274) 

a.  The  sample  mean  is  5.51  and  the  sample  standard  deviation  is  2.15 

e.  N  (5.51,2.15) 

f.  0.6081 

g.  0.64 
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Solution  to  Exercise  6.8.11  (p.  274) 

a.  The  sample  mean  is  60,136.4  and  the  sample  standard  deviation  is  10,468.1. 
e.  N  (60136.4, 10468.1) 


f.  0.7440 

g.  0.7167 

Solution  to 

Exercise 

6.8.12 

(P- 

275) 

B 

Solution  to 

Exercise 

6.8.13 

(p- 

275) 

c 

Solution  to 

Exercise 

6.8.14 

(p- 

276) 

D 

Solution  to 

Exercise 

6.8.15 

(p- 

276) 

C 

Solution  to 

Exercise 

6.8.16 

(p- 

276) 

A 

Solution  to 

Exercise 

6.8.17 

(p- 

276) 

D 

Solution  to 

Exercise 

6.8.18 

(p- 

276) 

C 

Solution  to 

Exercise 

6.8.19 

(p- 

276) 

B 

Solutions  to  Review 
Solution  to  Exercise  6.9.1  (p.  278) 


c:  False  -  the  median  and  the  mean  are  the  same  for  this  sjmfimetric  distribution 
d:  True 

Solution  to  Exercise  6.9.2  (p.  278) 


b:  True  -  each  quartile  has  25%  of  the  data 
c:  False  -  that  is  unknown 
d:  False  -  50%  of  the  data  are  4  or  less 

Solution  to  Exercise  6.9.4  (p.  278) 

D 

Solution  to  Exercise  6.9.5  (p.  278) 

A:  False  -  J  and  K  are  independent  so  they  are  not  mutually  exclusive  which  would  imply  dependency 

(meaning  P(J  and  K)  is  not  0). 
B:  False  -  see  answer  C. 

C:  True  -  P(J  or  K)  =  P(J)  +  P(K)  -  P(J  and  K)  =  P(J)  +  P(K)  -  P(J)P(K)  =  0.3  +  0.6  -  (0.3)(0.6)  =  0.72.  Note  that 
P(J  and  K)  -  P(J)P(K)  because  J  and  K  are  independent. 


a:  True 
b:  True 


a:  8 
b:  8 


Solution  to  Exercise  6.9.3  (p.  278) 

a:  False  -  |  of  the  data  are  at  most  5 
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D:  False  -  J  and  K  are  independent  so  P(J)  =  P(J  I  K). 

Solution  to  Exercise  6.9.6  (p.  279) 

A 
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Chapter  7 

The  Central  Limit  Theorem 


7.1  The  Central  Limit  Theorem^ 

7.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Recognize  the  Central  Limit  Theorem  problems. 

•  Classify  continuous  word  problems  by  their  distributions. 

•  Apply  and  interpret  the  Central  Limit  Theorem  for  Means. 

•  Apply  and  interpret  the  Central  Limit  Theorem  for  Sums. 


7.1.2  Introduction 

Why  are  we  so  concerned  with  means?  Two  reasons  are  that  they  give  us  a  middle  ground  for  comparison 
and  they  are  easy  to  calculate.  In  this  chapter,  you  will  study  means  and  the  Central  Limit  Theorem. 

The  Central  Limit  Theorem  (CLT  for  short)  is  one  of  the  most  powerful  and  useful  ideas  in  all  of  statistics. 
Both  alternatives  are  concerned  with  drawing  finite  samples  of  size  n  from  a  population  with  a  known 
mean,  }i,  and  a  known  standard  deviation,  a.  The  first  alternative  says  that  if  we  collect  samples  of  size 
n  and  n  is  "large  enough,"  calculate  each  sample's  mean,  and  create  a  histogram  of  those  means,  then  the 
resulting  histogram  will  tend  to  have  an  approximate  normal  bell  shape.  The  second  alternative  says  that 
if  we  again  collect  samples  of  size  n  that  are  "large  enough,"  calculate  the  sum  of  each  sample  and  create  a 
histogram,  then  the  resulting  histogram  will  again  tend  to  have  a  normal  bell-shape. 

In  either  case,  it  does  not  matter  what  the  distribution  of  the  original  population  is,  or  whether  you  even 
need  to  know  it.  The  important  fact  is  that  the  sample  means  and  the  sums  tend  to  follow  the  normal 
distribution.  And,  the  rest  you  will  learn  in  this  chapter. 

The  size  of  the  sample,  n,  that  is  required  in  order  to  be  to  be  'large  enough'  depends  on  the  original 
population  from  which  the  samples  are  drawn.  If  the  original  population  is  far  from  normal  then  more 
observations  are  needed  for  the  sample  means  or  the  sample  sums  to  be  normal.  Sampling  is  done  with 
replacement. 

Optional  Collaborative  Classroom  Activity 
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Do  the  following  example  in  class:  Suppose  8  of  you  roll  1  fair  die  10  times,  7  of  you  roll  2  fair  dice  10 
times,  9  of  you  roll  5  fair  dice  10  times,  and  11  of  you  roll  10  fair  dice  10  times. 

Each  time  a  person  rolls  more  than  one  die,  he/ she  calculates  the  sample  mean  of  the  faces  showing.  For 
example,  one  person  might  roU  5  fair  dice  and  get  a  2, 2, 3, 4,  6  on  one  roU. 

The  mean  is  2+2+3+4+6  _  3  4  jj^g  3  4  ^j^g  mean  when  5  fair  dice  are  rolled.  This  same  person  would 
roU  the  5  dice  9  more  times  and  calculate  9  more  means  for  a  total  of  10  means. 

Your  instructor  will  pass  out  the  dice  to  several  people  as  described  above.  RoU  your  dice  10  times.  For 
each  roll,  record  the  faces  and  find  the  mean.  Roimd  to  the  nearest  0.5. 

Your  instructor  (and  possibly  you)  will  produce  one  graph  (it  might  be  a  histogram)  for  1  die,  one  graph  for 
2  dice,  one  graph  for  5  dice,  and  one  graph  for  10  dice.  Since  the  "mean"  when  you  roll  one  die,  is  just  the 
face  on  the  die,  what  distribution  do  these  means  appear  to  be  representing? 

Draw  the  graph  for  the  means  using  2  dice.  Do  the  sample  means  show  any  kind  of  pattern? 

Draw  the  graph  for  the  means  using  5  dice.  Do  you  see  any  pattern  emerging? 

Finally,  draw  the  graph  for  the  means  using  10  dice.  Do  you  see  any  pattern  to  the  graph?  What  can  you 

conclude  as  you  increase  the  number  of  dice? 

As  the  number  of  dice  rolled  increases  from  1  to  2  to  5  to  10,  the  following  is  happening: 

1.  The  mean  of  the  sample  means  remains  approximately  the  same. 

2.  The  spread  of  the  sample  means  (the  standard  deviation  of  the  sample  means)  gets  smaller. 

3.  The  graph  appears  steeper  and  thinner. 

You  have  just  demonstrated  the  Central  Limit  Theorem  (CLT). 

The  Central  Limit  Theorem  tells  you  that  as  you  increase  the  number  of  dice,  the  sample  means  tend 
toward  a  normal  distribution  (the  sampling  distribution). 

7.2  The  Central  Limit  Theorem  for  Sample  Means  (Averages)^ 

Suppose  X  is  a  random  variable  with  a  distribution  that  may  be  known  or  imknown  (it  can  be  any  distri- 
bution). Using  a  subscript  that  matches  the  random  variable,  suppose: 

a.  Hx  =  the  mean  of  X 

b.  ax  =  the  standard  deviation  of  X 

If  you  draw  random  samples  of  size  n,  then  as  n  increases,  the  random  variable  X  which  consists  of  sample 
means,  tends  to  be  normally  distributed  and 

The  Central  Limit  Theorem  for  Sample  Means  says  that  if  you  keep  drawing  larger  and  larger  samples 
(like  roUing  1,  2,  5,  and,  finally,  10  dice)  and  calculating  their  means  the  sample  means  form  their  own 
normal  distribution  (the  sampling  distribution).  The  normal  distribution  has  the  same  mean  as  the 
original  distribution  and  a  variance  that  equals  the  original  variance  divided  by  n,  the  sample  size,  n  is  the 
number  of  values  that  are  averaged  together  not  the  nvmaber  of  times  the  experiment  is  done. 


^This  content  is  available  online  at  <http://cnx.org/content/ml6947/1.23/>. 
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To  put  it  more  formally,  if  you  draw  random  samples  of  size  M,the  distribution  of  the  random  vari- 
able X,  which  consists  of  sample  means,  is  called  the  sampling  distribution  of  the  mean.  The  sampling 
distribution  of  the  mean  approaches  a  normal  distribution  as  n,  the  sample  size,  increases. 

The  random  variable  X  has  a  different  z-score  associated  with  it  than  the  random  variable  X.  x  is  the  value 
of  X  in  one  sample. 


Fx 


(7.1) 


fix  is  both  the  average  of  X  and  of  X. 


(7y 


standard  deviation  of  X  and  is  called  the  standard  error  of  the  mean. 


Example  7.1 

An  unknown  distribution  has  a  mean  of  90  and  a  standard  deviation  of  15.  Samples  of  size  n  =  25 
are  drawn  randomly  from  the  population. 

Problem  1 

Find  the  probability  that  the  sample  mean  is  between  85  and  92. 
Solution 

Let  X  =  one  value  from  the  original  unknown  population.  The  probability  question  asks  you  to 
find  a  probability  for  the  sample  mean. 

Let  X  =  the  mean  of  a  sample  of  size  25.  Since  }ix  =  90,  cx  =  15,  and  n  =  25; 

thenX~N(90,^) 

Find  P  (85  <  X  <  92)        Draw  a  graph. 
P  (85  <  X  <  92)  =  0.6997 

The  probability  that  the  sample  mean  is  between  85  and  92  is  0.6997. 


P(S5  <    s    <  92) 


TI-83  or  84:  normal  cdf  (lower  value,  upper  value,  mean,  standard  error  of  the  mean) 
The  parameter  list  is  abbreviated  (lower  value,  upper  value, 

normal  cdf  (85,92,90,^)  =  0.6997 
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Problem  2 


Find  the  value  that  is  2  standard  deviations  above  the  expected  value  (it  is  90)  of  the  sample  mean. 


To  find  the  value  that  is  2  standard  deviations  above  the  expected  value  90,  use  the  formula 


value  =  90  +  2  •  ^  =  96 

So,  the  value  that  is  2  standard  deviations  above  the  expected  value  is  96. 


Example  7.2 

The  length  of  time,  in  hours,  it  takes  an  "over  40"  group  of  people  to  play  one  soccer  match  is 
normally  distributed  with  a  mean  of  2  hours  and  a  standard  deviation  of  0.5  hours.  A  sample  of 
size  n  =  50  is  drawn  randomly  from  the  population. 


Find  the  probability  that  the  sample  mean  is  between  1.8  hours  and  2.3  hours. 
Solution 

Let  X  =  the  time,  in  hours,  it  takes  to  play  one  soccer  match. 

The  probability  question  asks  you  to  find  a  probability  for  the  sample  mean  time,  in  hours,  it 

takes  to  play  one  soccer  match. 

Let  X  =  the  mean  time,  in  hours,  it  takes  to  play  one  soccer  match. 

If  fix  =   /  i^x  ~   '  and  n  —   ,  then  X  ~  N{  ,  ) 

by  the  Central  Limit  Theorem  for  Means. 


Solution 


Problem 


FindP(1.8<^<2.3). 


Draw  a  graph. 


P  (1.8  <  X  <  2.3)  ^  0.9977 


The  probability  that  the  mean  time  is  between  1.8  hours  and  2.3  hours  is 
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7.3  The  Central  Limit  Theorem  for  Sums^ 

Suppose  X  is  a  random  variable  with  a  distribution  that  may  be  known  or  unknown  (it  can  be  any  distii- 
bution)  and  suppose: 

a.  jix  —  the  mean  of  X 

b.  cTx  —  the  standard  deviation  of  X 

If  you  draw  random  samples  of  size  n,  then  as  n  increases,  the  random  variable  EX  which  consists  of  sums 
tends  to  be  nonnally  distributed  and 

ILX  r-^  N  {n  ■  nx,  Vn  ■  ax) 

The  Central  Limit  Theorem  for  Sums  says  that  if  you  keep  drawing  larger  and  larger  samples  and  taking 
their  sums,  the  sums  form  their  own  normal  distribution  (the  sampling  distribution)  which  approaches  a 
normal  distribution  as  the  sample  size  increases.  The  normal  distribution  has  a  mean  equal  to  the  original 
mean  multiplied  by  the  sample  size  and  a  standard  deviation  equal  to  the  original  standard  deviation 
multiplied  by  the  square  root  of  the  sample  size. 

The  random  variable  EX  has  the  following  z-score  associated  with  it: 

a.  Ex  is  one  sum. 


a.  n  ■  }ix  —  the  mean  of  EX 

b.  y/n  ■  CTx  —  standard  deviation  of  EX 

Example  7.3 

An  unknown  distribution  has  a  mean  of  90  and  a  standard  deviation  of  15.  A  sample  of  size  80  is 
drawn  randomly  from  the  population. 

Problem 

a.  Find  the  probability  that  the  sum  of  the  80  values  (or  the  total  of  the  80  values)  is  more  than 


b.  Find  the  sum  that  is  1.5  standard  deviations  above  the  mean  of  the  sums. 
Solution 

Let  X  =  one  value  from  the  original  unknown  population.  The  probability  question  asks  you  to 
find  a  probability  for  the  sum  (or  total  of)  80  values. 

EX  =  the  sum  or  total  of  80  values.  Since  fix  =  90,  ax  =  15,  and  n  =  80,  then 


•  .  mean  of  the  svms  =  n-}ix^  (80)  (90)  =  7200 

• .  standard  deviation  of  the  sums  =  y/n  ■  ax  —  VSO  •  15 

• .  sum  of  80  values  =  Ex  =  7500 

a:  Find  P  (Ex  >  7500) 
^This  content  is  available  online  at  <http://cnx.org/content/ml6948/1.16/>. 


b.  z  = 


Vn-(rx 


7500. 
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P  (Sx  >  7500)  =  0.0127 


normal cdf  (lower  value,  upper  value,  mean  of  sums,  stdev  of  sums) 
The  parameter  list  is  abbreviated  (lower,  upper,  n  ■  }ix,  ^/n  ■  cx) 
normalcdf  (7500,1E99,  80  ■  90,  ^/80  ■  15  =  0.0127 
Reminder:  1E99  =  10^''.  Press  the  EE  key  for  E. 

b:  Find      where  z  =  1.5: 

i:x  =  n-pix  +  z-^n-ax  =  (80)(90)  +  (1.5)(V80)  (15)  =  7401.2 


7.4  Using  the  Central  Limit  Theorem'' 

It  is  important  for  you  to  understand  when  to  use  the  CLT.  If  you  are  being  asked  to  find  the  probability  of 
the  mean,  use  the  CLT  for  the  mean.  If  you  are  being  asked  to  find  the  probability  of  a  sum  or  total,  use  the 
CLT  for  sums.  This  also  applies  to  percentiles  for  means  and  sums. 

NOTE:  If  you  are  being  asked  to  find  the  probability  of  an  individual  value,  do  not  use  the  CLT. 
Use  the  distribution  of  its  random  variable. 

7.4.1  Examples  of  the  Central  Limit  Theorem 

Law  of  Large  Numbers 

The  Law  of  Large  Numbers  says  that  if  you  take  samples  of  larger  and  larger  size  from  any  population, 
then  the  mean  x  of  the  sample  tends  to  get  closer  and  closer  to  }i.  From  the  Central  Limit  Theorem,  we 
know  that  as  n  gets  larger  and  larger,  the  sample  means  follow  a  normal  distribution.  The  larger  n  gets,  the 
smaller  the  standard  deviation  gets.  (Remember  that  the  standard  deviation  for  X  is  ^  .)  This  means  that 

the  sample  mean  x  must  be  close  to  the  population  mean  pi.  We  can  say  that  ji  is  the  value  that  the  sample 
means  approach  as  n  gets  larger.  The  Central  Limit  Theorem  illustrates  the  Law  of  Large  Numbers. 

Central  Limit  Theorem  for  the  Mean  and  Sum  Examples 

*This  content  is  available  online  at  <http://cnx.org/content/ml6958/1.21/>. 
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Example  7.4 

A  study  involving  stress  is  done  on  a  college  campus  among  the  students.  The  stress  scores  follow 
a  uniform  distribution  with  the  lowest  stress  score  equal  to  1  and  the  highest  equal  to  5.  Using  a 
sample  of  75  students,  find: 

1.  The  probability  that  the  mean  stress  score  for  the  75  students  is  less  than  2. 

2.  The  90th  percentile  for  the  mean  stress  score  for  the  75  students. 

3.  The  probability  that  the  total  of  the  75  stress  scores  is  less  than  200. 

4.  The  90th  percentile  for  the  total  stress  score  for  the  75  students. 

Let  X  =  one  stress  score. 

Problems  1.  and  2.  ask  you  to  find  a  probability  or  a  percentile  for  a  mean.  Problems  3  and  4  ask 
you  to  find  a  probability  or  a  percentile  for  a  total  or  sum.  The  sample  size,  n,  is  equal  to  75. 

Since  the  individual  stress  scores  follow  a  uniform  distribution,  X  ^  (J  (1,5)  where  a  =  1  and 
h  =  5  (See  Continuous  Random  Variables  (Section  5.1)  for  the  uniform). 

Fx  =  ^  =  ^  =  3 

For  problems  1.  and  2.,  let  X  =  the  mean  stress  score  for  the  75  students.  Then, 

X~n(3,  ^)  where  n  =  75. 

Problem  1 

Find  P  (x  <  2).         Draw  the  graph. 

Solution 

P(x  <  2)  =  0 

The  probability  that  the  mean  stress  score  is  less  than  2  is  about  0. 


p(  X  <  2) 


2  3  X 


normalcdf  (^1,2,3,  =0 

Reminder:  The  smallest  stress  score  is  1.  Therefore,  the  smallest  mean  for  75  stress  scores  is  1. 


Problem  2 

Find  the  90th  percentile  for  the  mean  of  75  stress  scores.  Draw  a  graph. 
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Solution 

Let  k  =  the  90th  precentile. 
Find  k  where  P  {x  <  k)  =  0.90. 
k  =  3.2 


The  90th  percentile  for  the  mean  of  75  scores  is  about  3.2.  This  tells  us  that  90%  of  all  the  means  of 
75  stress  scores  are  at  most  3.2  and  10%  are  at  least  3.2. 


invNorm 


3.2 


For  problems  c  and  d,   let  SX  =  the  sum  of  the  75  stress  scores.      Then,  ZX 
N  [(75)  ■  (3),V75-1.15 
Problem  3 

Find  P  {Zx  <  200) .         Draw  the  graph. 
Solution 

The  mean  of  the  sum  of  75  stress  scores  is  75  ■  3  =  225 

The  standard  deviation  of  the  sum  of  75  stress  scores  is  y/75  ■  1.15  =  9.96 

P  (Sx  <  200)  =  0 


Pf^S  <  200 


The  probability  that  the  total  of  75  scores  is  less  than  200  is  about  0. 
normalcdf  (75, 200, 75  ■  3,  ^/75  ■  1.15)  =  0. 
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Reminder:  The  smallest  total  of  75  stress  scores  is  75  since  the  smallest  single  score  is  1. 


Problem  4 

Find  the  90th  percentile  for  the  total  of  75  stress  scores.  Draw  a  graph. 
Solution 

Let  k  =  the  90th  percentile. 


Find  k  where  P  (Sx  <  k)  =  0.90. 
k  =  237.8 


The  90th  percentile  for  the  sum  of  75  scores  is  about  237.8.  This  tells  us  that  90%  of  all  the  sums  of 
75  scores  are  no  more  than  237.8  and  10%  are  no  less  than  237.8. 

invNorm  f  .90, 75  ■  3,  ^75  ■  1.15)  =  237.8 


Example  7.5 

Suppose  that  a  market  research  analyst  for  a  cell  phone  company  conducts  a  study  of  their  cus- 
tomers who  exceed  the  time  allowance  included  on  their  basic  cell  phone  contract;  the  analyst 
finds  that  for  those  people  who  exceed  the  time  included  in  their  basic  contract,  the  excess  time 
used  follows  an  exponential  distribution  with  a  mean  of  22  minutes. 

Consider  a  random  sample  of  80  customers  who  exceed  the  time  allowance  included  in  their  basic 
cell  phone  contract. 

Let  X  =  the  excess  time  used  by  one  INDIVIDUAL  cell  phone  customer  who  exceeds  his  contracted 
time  allowance. 

X  ~  Exp  (^22)  Frorri  Chapter  5,  we  know  that  }i  =  22  and  a  =  22. 

Let  X  =  the  mean  excess  time  used  by  a  sample  otn=  80  customers  who  exceed  their  contracted 
time  allowance. 

X^N  {22,  ^)  by  the  CLT  for  Sample  Means 
Problem  1 

Using  the  CLT  to  find  Probability: 
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a.  Find  the  probability  that  the  mean  excess  time  used  by  the  80  customers  in  the  sample  is  longer 

than  20  minutes.  This  is  asking  us  to  find  P  {x  >  20)        Draw  the  graph. 

b.  Suppose  that  one  customer  who  exceeds  the  time  limit  for  his  cell  phone  contract  is  randomly 

selected.  Find  the  probability  that  this  individual  customer's  excess  time  is  longer  than  20 
minutes.  This  is  asking  us  to  find  P  (x  >  20) 

c.  Explain  why  the  probabilities  in  (a)  and  (b)  are  different. 

Solution 
Part  a. 

Find:  P  {x  >  20) 

P{x>  20)  =  0.7919  using  normalcdf  (lO,  1E99, 22,  ^) 

The  probability  is  0.7919  that  the  mean  excess  time  used  is  more  than  20  minutes,  for  a  sample  of 
80  customers  who  exceed  their  contracted  time  allowance. 


p{x  >  2o) 


20  22 

Reminder:  1E99  =  lO'''*  and-lE99  =  -lO''^.  Press  the  EE  key  for  E.  Or  just  use  10^99  instead  of 
1E99. 

Part  b. 

Find  P(x>20) .  Remember  to  use  the  exponential  distribution  for  an  individual:  X-^Expd/II). 

P(X>20)  =  e^(-(l/22)*20)  or  e^(-.04545*20)  =  0.4029 

Part  c.  Explain  why  the  probabilities  in  (a)  and  (b)  are  different. 

P  (x  >  20)  =  0.4029  but  P  (x  >  20)  =  0.7919 

The  probabilities  are  not  equal  because  we  use  different  distributions  to  calculate  the  probability 
for  individuals  and  for  means. 

When  asked  to  find  the  probability  of  an  individual  value,  use  the  stated  distribution  of  its  ran- 
dom variable;  do  not  use  the  CLT.  Use  the  CLT  with  the  normal  distribution  when  you  are 
being  asked  to  find  the  probability  for  an  mean. 


Problem  2 

Using  the  CLT  to  find  Percentiles: 

Find  the  95th  percentile  for  the  sample  mean  excess  time  for  samples  of  80  customers  who  exceed 
their  basic  contract  time  allowances.  Draw  a  graph. 
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Solution 

Let  k  =  the  95th  percentile.  Find  k  where  P  {x  <  k)  =  0.95 
k  =  26.0  using   invNorm(.95,22,         =  26.0 


p(x  <  k)=  0.95 


22  k 

The  95th  percentile  for  the  sample  mean  excess  time  used  is  about  26.0  minutes  for  random 
samples  of  80  customers  who  exceed  their  contractual  allowed  time. 

95%  of  such  samples  would  have  means  under  26  minutes;  only  5%  of  such  samples  would  have 
means  above  26  minutes. 


NOTE:  (HISTORICAL):  Normal  Approximation  to  the  Binomial 

Historically,  being  able  to  compute  binomial  probabilities  was  one  of  the  most  important  applications  of  the 
Central  Limit  Theorem.  Binomial  probabilities  were  displayed  in  a  table  in  a  book  with  a  small  value  for  n 
(say,  20).  To  calculate  the  probabilities  with  large  values  of  n,  you  had  to  use  the  binomial  formula  which 
could  be  very  complicated.  Using  the  Normal  Approximation  to  the  Binomial  simplified  the  process.  To 
compute  the  Normal  Approximation  to  the  Binomial,  take  a  simple  random  sample  from  a  population.  You 
must  meet  the  conditions  for  a  binomial  distribution: 

• .  there  are  a  certain  number  n  of  independent  trials 
• .  the  outcomes  of  any  trial  are  success  or  failure 
• .  each  trial  has  the  same  probability  of  a  success  p 

Recall  that  if  X  is  the  binomial  random  variable,  then  X~B  (n,  p).  The  shape  of  the  binomial  distribution 
needs  to  be  similar  to  the  shape  of  the  normal  distribution.  To  ensure  this,  the  quantities  np  and  nq  must 
both  be  greater  than  five  {np  >  5  and  nq  >  5;  the  approximation  is  better  if  they  are  both  greater  than  or 
equal  to  10).  Then  the  binomial  can  be  approximated  by  the  normal  distribution  with  mean  pi  =  np  and 
standard  deviation  a  =  yjnpq.  Remember  that  q  =  1  —  p.ln  order  to  get  the  best  approximation,  add  0.5  to 
X  or  subtract  0.5  from  x  ( use  x  +  0.5  or  x  —  0.5.  The  number  0.5  is  called  the  continuity  correction  factor. 

Example  7.6 

Suppose  in  a  local  Kindergarten  through  12th  grade  (K  -  12)  school  district,  53  percent  of  the 
population  favor  a  charter  school  for  grades  K  -  5.  A  simple  random  sample  of  300  is  surveyed. 

1.  Find  the  probability  that  at  least  150  favor  a  charter  school. 

2.  Find  the  probability  that  at  most  160  favor  a  charter  school. 
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3.  Find  the  probability  that  more  than  155  favor  a  charter  school. 

4.  Find  the  probability  that  less  than  147  favor  a  charter  school. 

5.  Find  the  probability  that  exactly  175  favor  a  charter  school. 

Let  X  =  the  number  that  favor  a  charter  school  for  grades  K  -  5.  X-^B  {n,  p)  where  n  —  300  and 
p  —  0.53.  Since  np  >  5  and  nq  >  5,  use  the  normal  approximation  to  the  binomial.  The  formiilas 
for  the  mean  and  standard  deviation  are  p  =  np  and  a  =  ^Jnpq.  The  mean  is  159  and  the  standard 
deviation  is  8.6447.  The  random  variable  for  the  normal  distribution  is  Y.  Y  ~  N  (159, 8.6447).  See 
The  Normal  Distribution  for  help  with  calciilator  instructions. 

For  Problem  1.,  you  include  150  soP{x>  150)  has  normal  approximation  P  (Y  >  149.5)  —  0.8641. 
normal  cdf  (149.5,10^99,159,8.6447)  =  0.8641. 

For  Problem  2.,  you  include  160  so  P  (^c  <  160)  has  normal  approximation  P  ( Y  <  160.5)  —  0.5689. 
normalcdf  (0,160.5,159,8.6447)  =  0.5689 

For  Problem  3.,  you  exclude  155  so  P  (^c  >  155)  has  normal  approximation  P  (y  >  155.5)  —  0.6572. 
normalcdf  (155.5,10^99,159,8.6447)  =  0.6572 

For  Problem  4.,  you  exclude  147  so  P  (^c  <  147)  has  normal  approximation  P  ( Y  <  146.5)  —  0.0741. 
normalcdf  (0,146.5,159,8.6447)  =  0.0741 

For  Problem  5.,  P  {x  ^  175)  has  normal  approximation  P  (174.5  <  y  <  175.5)  =  0.0083. 
normalcdf  (174.5,175.5,159,8.6447)  =0.0083 

Because  of  calculators  and  computer  software  that  easily  let  you  calciilate  binomial  probabilities 

for  large  values  of  n,  it  is  not  necessary  to  use  the  the  Normal  Approximation  to  the  Binomial 
provided  you  have  access  to  these  technology  tools.  Most  school  labs  have  Microsoft  Excel,  an 
example  of  computer  software  that  calculates  binomial  probabilities.  Many  students  have  access 
to  the  Tl-83  or  84  series  calculators  and  they  easily  calculate  probabilities  for  the  binomial.  In  an 
Internet  browser,  if  you  type  in  "binomial  probability  distribution  calculation,"  you  can  find  at 
least  one  online  calcxilator  for  the  binomial. 

For  Example  3,  the  probabilities  are  calculated  using  the  binomial  (n  —  300  and  p  —  0.53)  below. 
Compare  the  binomial  and  normal  distribution  answers.  See  Discrete  Random  Variables  for  help 
with  calculator  instructions  for  the  binomial. 


P{x 

> 

150): 

1  -  binomialcdf  (300,0.53,149)  =  0.8641 

P{x 

< 

160): 

binomialcdf  (300,0.53,160)  =  0.5684 

P{x 

> 

155): 

1  -  binomialcdf  (300,0.53,155)  =  0.6576 

P{x 

< 

147): 

binomialcdf  (300,0.53,146)  =  0.0742 

P{x 

175): 

(You  use  the  binomial  pdf.)  binomialpdf  (175,0.53, 146)  =  0.0083 

Contributions  made  to  Example  2  by  Roberta  Bloom 
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7.5  Summary  of  Formulas^ 

Formula  7.1:  Central  Limit  Theorem  for  Sample  Means 
X~n(hx,^)  The  Mean  (X):  fix 

Formula  7.2:  Central  Limit  Theorem  for  Sample  Means  Z-Score  and  Standard  Error  of  the  Mean 
z  =  Standard  Error  of  the  Mean  (Standard  Deviation  (X)):  ^ 

Formula  7.3:  Central  Limit  Theorem  for  Sums 

EX  ^  N  [{n)  ■  fix,  \pn  ■  (ix\      Mean  for  Sums  (Z,X):     n  ■  fix 

Formula  7.4:  Central  Limit  Theorem  for  Sums  Z-Score  and  Standard  Deviation  for  Sums 
z  —  ^^^'^^  Standard  Deviation  for  Sums  (EX):  yjn-  ax 


^This  content  is  available  online  at  <http://cnx.org/content/ml6956/1.8/>. 
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7.6  Practice:  The  Central  Limit  Theorem^ 

7.6.1  Student  Learning  Outcomes 

•  The  student  will  calculate  probabilities  using  the  Central  Limit  Theorem. 


7.6.2  Given 

Yoonie  is  a  personnel  manager  in  a  large  corporation.  Each  month  she  must  review  16  of  the  employees. 
From  past  experience,  she  has  found  that  the  reviews  take  her  approximately  4  hours  each  to  do  with  a 
population  standard  deviation  of  1.2  hours.  Let  X  be  the  random  variable  representing  the  time  it  takes 
her  to  complete  one  review.  Assume  X  is  normally  distributed.  Let  X  be  the  random  variable  representing 
the  mean  time  to  complete  the  16  reviews.  Let  ILX  be  the  total  time  it  takes  Yoonie  to  complete  all  of  the 
month's  reviews.  Assume  that  the  16  reviews  represent  a  random  set  of  reviews. 

7.6.3  Distribution 

Complete  the  distributions. 

1.  X~ 

2.  X~ 

3.  EX- 

7.6.4  Graphing  Probability 

For  each  problem  below: 

a.  Sketch  the  graph.  Label  and  scale  the  horizontal  axis.  Shade  the  region  corresponding  to  the  probability. 

b.  Calculate  the  value. 

Exercise  7.6.1  (Solution  on  p.  323.) 

Find  the  probability  that  one  review  will  take  Yoonie  from  3.5  to  4.25  hours. 


X 

a. 

b.  P  (  <x<  =  

Exercise  7.6.2  (Solution  on  p.  323.) 

Find  the  probability  that  the  mean  of  a  month's  reviews  will  take  Yoonie  from  3.5  to  4.25  hrs. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6954/l.12/>. 
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X 

b-  P{  )  =  

Exercise  7.6.3  (Solution  on  p.  323.) 

Find  the  95th  percentile  for  the  mean  time  to  complete  one  month's  reviews. 


X 

a. 

b.  The  95th  Percentile= 

Exercise  7.6.4  (Solution  on  p.  323.) 

Find  the  probability  that  the  sum  of  the  month's  reviews  takes  Yoonie  from  60  to  65  hours. 


rx 

a. 

b.  The  Probability= 

Exercise  7.6.5  (Solution  on  p.  323.) 

Find  the  95th  percentile  for  the  sum  of  the  month's  reviews. 
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a. 

b.  The  95th  percentile= 


7.6.5  Discussion  Question 
Exercise  7.6.6 

What  causes  the  probabilities  in  Exercise  7.6.1  and  Exercise  7.6.2  to  differ? 
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7.7  Homework^ 

Exercise  7.7.1  (Solution  on  p.  323.) 

X  ~  N  (60,9).  Suppose  that  you  form  random  samples  of  25  from  this  distribution.  Let  X  be  the 
random  variable  of  averages.  Let  EX  be  the  random  variable  of  sums.  For  c  -  f ,  sketch  the  graph, 
shade  the  region,  label  and  scale  the  horizontal  axis  for  X,  and  find  the  probability. 

a.  Sketch  the  distributions  of  X  and  X  on  the  same  graph. 

b.  X  ~ 

c.  P{x  <  60)  = 

d.  Find  the  30th  percentile  for  the  mean. 

e.  P  (56  <  X  <  62)  = 

f.  P(18<x<58)  = 

g.  Ex- 

h.  Find  the  minimum  value  for  the  upper  quartile  for  the  sum. 

i.  P  (1400  <  Ex  <  1550)  = 

Exercise  7.7.2 

Determine  which  of  the  following  are  true  and  which  are  false.  Then,  in  complete  sentences, 
justify  your  answers. 

a.  When  the  sample  size  is  large,  the  mean  of  X  is  approximately  equal  to  the  mean  of  X. 

b.  When  the  sample  size  is  large,  X  is  approximately  normally  distributed. 

c.  When  the  sample  size  is  large,  the  standard  deviation  of  X  is  approximately  the  same  as  the 

standard  deviation  of  X. 


Exercise  7.7.3  (Solution  on  p.  323.) 

The  percent  of  fat  calories  that  a  person  in  America  consumes  each  day  is  normally  distributed 
with  a  mean  of  about  36  and  a  standard  deviation  of  about  10.  Suppose  that  16  individuals  are 
randomly  chosen. 

Let  X  ^average  percent  of  fat  calories. 

a.  X~  (  ) 

b.  For  the  group  of  16,  find  the  probability  that  the  average  percent  of  fat  calories  consumed  is 

more  than  5.  Graph  the  situation  and  shade  in  the  area  to  be  determined. 

c.  Find  the  first  quartile  for  the  average  percent  of  fat  calories. 

Exercise  7.7.4 

Previously,  De  Anza  statistics  students  estimated  that  the  amount  of  change  daytime  statistics 
students  carry  is  exponentially  distributed  with  a  mean  of  $0.88.  Suppose  that  we  randomly  pick 
25  daytime  statistics  students. 

a.  In  words,  X  — 

b.  X~ 

c.  In  words,  X  — 

d.  X~  (  ,  ) 

e.  Find  the  probability  that  an  individual  had  between  $0.80  and  $1.00.  Graph  the  situation  and 

shade  in  the  area  to  be  determined. 

f.  Find  the  probability  that  the  average  of  the  25  students  was  between  $0.80  and  $1.00.  Graph  the 

situation  and  shade  in  the  area  to  be  determined. 

''This  content  is  available  online  at  <http://caTx.org/content/ml6952/1.24/>. 
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g.  Explain  the  why  there  is  a  difference  in  (e)  and  (f). 

Exercise  7.7.5  (Solution  on  p.  323.) 

Suppose  that  the  distance  of  fly  balls  hit  to  the  outfield  (in  baseball)  is  normally  distributed  with 
a  mean  of  250  feet  and  a  standard  deviation  of  50  feet.  We  randomly  sample  49  fly  balls. 

a.  If  X  =  average  distance  in  feet  for  49  fly  balls,  then  X~  (  ,  ) 

b.  What  is  the  probability  that  the  49  balls  traveled  an  average  of  less  than  240  feet?  Sketch  the 

graph.  Scale  the  horizontal  axis  for  X.  Shade  the  region  corresponding  to  the  probability. 
Find  the  probability. 

c.  Find  the  80th  percentile  of  the  distribution  of  the  average  of  49  fly  balls. 

Exercise  7.7.6 

Suppose  that  the  weight  of  open  boxes  of  cereal  in  a  home  with  children  is  uniformly  distributed 
from  2  to  6  pounds.  We  randomly  survey  64  homes  with  children. 

a.  In  words,  X  — 

b.  X~ 

c.  Hx  = 

d.  ax  = 

e.  In  words,  Z,X  — 

f.  EX~ 

g.  Find  the  probability  that  the  total  weight  of  open  boxes  is  less  than  250  poimds. 

h.  Find  the  35th  percentile  for  the  total  weight  of  open  boxes  of  cereal. 

Exercise  7.7.7  (Solution  on  p.  323.) 

Suppose  that  the  duration  of  a  particular  type  of  criminal  trial  is  known  to  have  a  mean  of  21  days 
and  a  standard  deviation  of  7  days.  We  randomly  sample  9  trials. 

a.  In  words,  SX  = 

b.  ZX~ 

c.  Find  the  probability  that  the  total  length  of  the  9  trials  is  at  least  225  days. 

d.  90  percent  of  the  total  of  9  of  these  types  of  trials  will  last  at  least  how  long? 

Exercise  7.7.8 

According  to  the  Internal  Revenue  Service,  the  average  length  of  time  for  an  individual  to  com- 
plete (record  keep,  learn,  prepare,  copy,  assemble  and  send)  IRS  Form  1040  is  10.53  hours  (without 
any  attached  schedules).  The  distribution  is  unknown.  Let  us  assume  that  the  standard  deviation 
is  2  hours.  Suppose  we  randomly  sample  36  taxpayers. 

a.  In  words,  X  — 

b.  In  words,  X  — 

c.  X~ 

d.  Would  you  be  surprised  if  the  36  taxpayers  finished  their  Form  1040s  in  an  average  of  more 

than  12  hours?  Explain  why  or  why  not  in  complete  sentences. 

e.  Would  you  be  surprised  if  one  taxpayer  finished  his  Form  1040  in  more  than  12  hoiurs?  In  a 

complete  sentence,  explain  why. 

Exercise  7.7.9  (Solution  on  p.  323.) 

Suppose  that  a  category  of  world  class  runners  are  known  to  run  a  marathon  (26  miles)  in  an 
average  of  145  minutes  with  a  standard  deviation  of  14  minutes.  Consider  49  of  the  races. 

Let  X  =  the  average  of  the  49  races. 
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a.  X~ 

b.  Find  the  probability  that  the  runner  will  average  between  142  and  146  minutes  in  these  49 

marathons. 

c.  Find  the  80th  percentile  for  the  average  of  these  49  marathons. 

d.  Find  the  median  of  the  average  running  times. 

Exercise  7.7.10 

The  attention  span  of  a  two  year-old  is  exponentially  distributed  with  a  mean  of  about  8  minutes. 
Suppose  we  randomly  survey  60  two  year-olds. 

a.  In  words,  X  — 

b.  X~  _ 

c.  In  words,  X  — 

d.  X~ 

e.  Before  doing  any  calculations,  which  do  you  think  will  be  higher?  Explain  why. 

i.  the  probability  that  an  individual  attention  span  is  less  than  10  minutes;  or 

ii.  the  probability  that  the  average  attention  span  for  the  60  children  is  less  than  10  minutes? 

Why? 

f.  Calculate  the  probabilities  in  part  (e). 

g.  Explain  why  the  distribution  for  X  is  not  exponential. 

Exercise  7.7.11  (Solution  on  p.  324.) 

Suppose  that  the  length  of  research  papers  is  uniformly  distributed  from  10  to  25  pages.  We 
survey  a  class  in  which  55  research  papers  were  turned  in  to  a  professor.  The  55  research  papers 
are  considered  a  random  collection  of  all  papers.  We  are  interested  in  the  average  length  of  the 
research  papers. 

a.  In  words,  X  — 

b.  X~ 

c.  Fx  = 

d.  ax=  _ 

e.  In  words,  X  — 

f.  X~ 

g.  In  words,  LX  — 

h.  EX- 

i.  Without  doing  any  calculations,  do  you  think  that  it's  likely  that  the  professor  will  need  to  read 

a  total  of  more  than  1050  pages?  Why? 
j.  Calculate  the  probability  that  the  professor  will  need  to  read  a  total  of  more  than  1050  pages, 
k.  Why  is  it  so  imlikely  that  the  average  length  of  the  papers  wiU  be  less  than  12  pages? 

Exercise  7.7.12 

The  length  of  songs  in  a  collector's  CD  collection  is  uniformly  distributed  from  2  to  3.5  minutes. 
Suppose  we  randomly  pick  5  CDs  from  the  collection.  There  is  a  total  of  43  songs  on  the  5  CDs. 

a.  In  words,  X  — 

b.  X~  _ 

c.  In  words,  X= 

d.  X~ 

e.  Find  the  first  quartile  for  the  average  song  length. 

f .  The  IQR  (interquartile  range)  for  the  average  song  length  is  from  to  . 
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Exercise  7.7.13  (Solution  on  p.  324.) 

Salaries  for  teachers  in  a  particular  elementary  school  district  are  normally  distiibuted  with  a 
mean  of  $44,000  and  a  standard  deviation  of  $6500.  We  randomly  survey  10  teachers  from  that 
district. 

a.  In  words,  X  = 

b.  In  words,  X  = 

c.  X~ 

d.  In  words,  EX  — 

e.  EX- 

f .  Find  the  probability  that  the  teachers  earn  a  total  of  over  $400,000. 

g.  Find  the  90th  percentile  for  an  individual  teacher's  salary. 

h.  Find  the  90th  percentile  for  the  average  teachers'  salary. 

i.  If  we  surveyed  70  teachers  instead  of  10,  graphically,  how  would  that  change  the  distribution 

for  X? 

j.  If  each  of  the  70  teachers  received  a  $3000  raise,  graphically,  how  would  that  change  the  distri- 
bution for  X? 

Exercise  7.7.14 

The  distiibution  of  income  in  some  Third  World  coimtries  is  considered  wedge  shaped  (many 
very  poor  people,  very  few  middle  income  people,  and  few  to  many  wealthy  people).  Suppose  we 
pick  a  country  with  a  wedge  distribution.  Let  the  average  salary  be  $2000  per  year  with  a  standard 
deviation  of  $8000.  We  randomly  survey  1000  residents  of  that  country. 

a.  In  words,  X  — 

b.  In  words,  X  — 

c.  X~ 

d.  How  is  it  possible  for  the  standard  deviation  to  be  greater  than  the  average? 

e.  Why  is  it  more  likely  that  the  average  of  the  1000  residents  wiU  be  from  $2000  to  $2100  than 

from  $2100  to  $2200? 

Exercise  7.7.15  (Solution  on  p.  324.) 

The  average  length  of  a  maternity  stay  in  a  U.S.  hospital  is  said  to  be  2.4  days  with  a  standard  de- 
viation of  0.9  days.  We  randomly  survey  80  women  who  recently  bore  children  in  a  U.S.  hospital. 

a.  In  words,  X  — 

b.  In  words,  X  — 

c.  X~ 

d.  In  words,  EX  — 

e.  EX~ 

f .  Is  it  likely  that  an  individual  stayed  more  than  5  days  in  the  hospital?  Why  or  why  not? 

g.  Is  it  likely  that  the  average  stay  for  the  80  women  was  more  than  5  days?  Why  or  why  not? 

h.  Which  is  more  likely: 

i.  an  individual  stayed  more  than  5  days;  or 

ii.  the  average  stay  of  80  women  was  more  than  5  days? 

i.  If  we  were  to  sum  up  the  women's  stays,  is  it  likely  that,  collectively  they  spent  more  than  a 

year  in  the  hospital?  Why  or  why  not? 

Exercise  7.7.16 

In  1940  the  average  size  of  a  U.S.  farm  was  174  acres.  Let's  say  that  the  standard  deviation  was  55 
acres.  Suppose  we  randomly  survey  38  farmers  from  1940.  (Source:  U.S.  Dept.  of  Agriculture) 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


309 


a.  In  words,  X  — 

b.  In  words,  X  — 

c.  X~  _ 

d.  The  IQR  for  X  is  from  acres  to  acres. 

Exercise  7.7.17  (Solution  on  p.  324.) 

The  stock  closing  prices  of  35  U.S.  semiconductor  manufacturers  are  given  below.  (Source:  Wall 
Street  Journal) 

8.625;  30.25;  27.625;  46.75;  32.875;  18.25;  5;  0.125;  2.9375;  6.875;  28.25;  24.25;  21;  1.5;  30.25;  71;  43.5; 
49.25;  2.5625;  31;  16.5;  9.5;  18.5;  18;  9;  10.5;  16.625;  1.25;  18;  12.875;  7;  12.875;  2.875;  60.25;  29.25 

a.  In  words,  X  — 
h.  i.  X  = 

ii.  Sx  = 

iii.  n  — 

c.  Construct  a  histogram  of  the  distribution  of  the  averages.  Start  at  x  —  —0.0005.  Make  bar 

widths  of  10. 

d.  In  words,  describe  the  distribution  of  stock  prices. 

e.  Randomly  average  5  stock  prices  together.  (Use  a  random  number  generator.)  Continue  aver- 

aging 5  pieces  together  until  you  have  10  averages.  List  those  10  averages. 

f.  Use  the  10  averages  from  (e)  to  calciilate: 

i.  X  = 

ii.  s^  = 

g.  Construct  a  histogram  of  the  distribution  of  the  averages.  Start  at  x  =  —0.0005.  Make  bar 

widths  of  10. 

h.  Does  this  histogram  look  like  the  graph  in  (c)? 

i.  In  1  -  2  complete  sentences,  explain  why  the  graphs  either  look  the  same  or  look  different? 
j.  Based  upon  the  theory  of  the  Central  Limit  Theorem,  X~ 

Exercise  7.7.18 

Use  the  Initial  Public  Offering  data  (Section  14.3.2:  Stock  Prices)  (see  "Table  of  Contents)  to  do  this 
problem. 

a.  In  words,  X  — 

b.  i.  fix  = 

ii.  ax  = 

iii.  n  = 

c.  Construct  a  histogram  of  the  distribution.  Start  at  j  =  —0.50.  Make  bar  widths  of  $5. 

d.  In  words,  describe  the  distribution  of  stock  prices. 

e.  Randomly  average  5  stock  prices  together.  (Use  a  random  number  generator.)  Continue  aver- 

aging 5  pieces  together  until  you  have  15  averages.  List  those  15  averages. 

f .  Use  the  15  averages  from  (e)  to  calculate  the  following: 

i.  X  — 

ii.  s^  = 

g.  Construct  a  histogram  of  the  distribution  of  the  averages.  Start  at  x  =  —0.50.  Make  bar  widths 

of  $5. 

h.  Does  this  histogram  look  like  the  graph  in  (c)?  Explain  any  differences. 

i.  In  1  -  2  complete  sentences,  explain  why  the  graphs  either  look  the  same  or  look  different? 
j.  Based  upon  the  theory  of  the  Central  Limit  Theorem,  X~ 
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7.7.1  Try  these  multiple  choice  questions  (Exercisesl9  -  23). 

The  next  two  questions  refer  to  the  following  information:  The  time  to  wait  for  a  particular  rural  bus 
is  distributed  uniformly  from  0  to  75  minutes.  100  riders  are  randomly  sampled  to  learn  how  long  they 
waited. 

Exercise  7.7.19  (Solution  on  p.  324.) 

The  90th  percentile  sample  average  wait  time  (in  minutes)  for  a  sample  of  100  riders  is: 

A.  315.0 

B.  40.3 

C.  38.5 

D.  65.2 

Exercise  7.7.20  (Solution  on  p.  324.) 

Would  you  be  surprised,  based  upon  numerical  calciilations,  if  the  sample  average  wait  time  (in 
minutes)  for  100  riders  was  less  than  30  minutes? 

A.  Yes 

B.  No 

C.  There  is  not  enough  information. 

Exercise  7.7.21  (Solution  on  p.  324.) 

Which  of  the  following  is  NOT  TRUE  about  the  distribution  for  averages? 

A.  The  mean,  median  and  mode  are  equal 

B.  The  area  under  the  curve  is  one 

C.  The  curve  never  touches  the  x-axis 

D.  The  curve  is  skewed  to  the  right 

The  next  three  questions  refer  to  the  following  information:  The  cost  of  unleaded  gasoline  in  the  Bay  Area 
once  followed  an  unknown  distribution  with  a  mean  of  $4.59  and  a  standard  deviation  of  $0.10.  Sixteen  gas 
stations  from  the  Bay  Area  are  randomly  chosen.  We  are  interested  in  the  average  cost  of  gasoline  for  the 
16  gas  stations. 

Exercise  7.7.22  (Solution  on  p.  324.) 

The  distribution  to  use  for  the  average  cost  of  gasoline  for  the  16  gas  stations  is 


A.  X~N(4.59,0.10) 
0.10  S 

Vie) 


B.X~n(4.59,M|) 


C.  X  ~  N  (4.59,  ^) 

D.  X~N  (4.59,0!^) 

Exercise  7.7.23  (Solution  on  p.  324.) 

What  is  the  probability  that  the  average  price  for  16  gas  stations  is  over  $4.69? 

A.  Almost  zero 

B.  0.1587 

C.  0.0943 

D.  Unknown 

Exercise  7.7.24  (Solution  on  p.  324.) 

Find  the  probability  that  the  average  price  for  30  gas  stations  is  less  than  $4.55. 
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A.  0.6554 

B.  0.3446 

C.  0.0142 

D.  0.9858 

E.  0 

Exercise  7.7.25  (Solution  on  p.  324.) 

For  the  Charter  School  Problem  (Example  6)  in  Central  Limit  Theorem:  Using  the  Central  Limit 
Theorem,  calculate  the  following  using  the  normal  approximation  to  the  binomial. 

A.  Find  the  probability  that  less  than  100  favor  a  charter  school  for  grades  K  -  5. 

B.  Find  the  probability  that  170  or  more  favor  a  charter  school  for  grades  K  -  5. 

C.  Find  the  probability  that  no  more  than  140  favor  a  charter  school  for  grades  K  -  5. 

D.  Find  the  probability  that  there  are  fewer  than  130  that  favor  a  charter  school  for  grades  K  -  5. 

E.  Find  the  probability  that  exactly  150  favor  a  charter  school  for  grades  K  -  5. 

If  you  either  have  access  to  an  appropriate  calculator  or  computer  software,  try  calculating  these 
probabilities  using  the  technology.  Try  also  using  the  suggestion  that  is  at  the  bottom  of  Central 
Limit  Theorem:  Using  the  Central  Limit  Theorem  for  finding  a  website  that  calculates  binomial 
probabilities. 

Exercise  7.7.26  (Solution  on  p.  324.) 

Four  friends,  Janice,  Barbara,  Kathy  and  Roberta,  decided  to  carpool  together  to  get  to  school. 
Each  day  the  driver  would  be  chosen  by  randomly  selecting  one  of  the  four  names.  They  carpool 
to  school  for  96  days.  Use  the  normal  approximation  to  the  binomial  to  calculate  the  following 
probabilities.  Round  the  standard  deviation  to  4  decimal  places. 

A.  Find  the  probability  that  Janice  is  the  driver  at  most  20  days. 

B.  Find  the  probability  that  Roberta  is  the  driver  more  than  16  days. 

C.  Find  the  probability  that  Barbara  drives  exactly  24  of  those  96  days. 

If  you  either  have  access  to  an  appropriate  calculator  or  computer  software,  try  calculating  these 
probabilities  using  the  technology.  Try  also  using  the  suggestion  that  is  at  the  bottom  of  Central 
Limit  Theorem:  Using  the  Central  Limit  Theorem  for  finding  a  website  that  calciilates  binomial 
probabilities. 

Exercise  24  contributed  by  Roberta  Bloom 
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7.8  Review' 

The  next  three  questions  refer  to  the  following  information:  Richard's  Furniture  Company  delivers  fur- 
niture from  10  A.M.  to  2  P.M.  continuously  and  uniformly.  We  are  interested  in  how  long  (in  hours)  past 
the  10  A.M.  start  time  that  individuals  wait  for  their  delivery. 

Exercise  7.8.1  (Solution  on  p.  325.) 

X  ~ 

A.  11(0,4) 

B.  U(10,2) 

C.  Exp  (2) 

D.  N(2,l) 


Exercise  7.8.2 

The  average  wait  time  is: 


(Solution  on  p.  325.) 


A.  1  hour 

B.  2  hour 

C.  2.5  hour 

D.  4  hour 

Exercise  7.8.3  (Solution  on  p.  325.) 

Suppose  that  it  is  now  past  noon  on  a  delivery  day.  The  probability  that  a  person  must  wait  at 
least  1 2  more  hours  is: 

A  1 

A.  4 


1 

2 
3 
4 

D.  i 


B. 
C. 


Exercise  7.8.4 
Given:  X~Exp  ^jj. 

a.  Find  P  (x  >  1) 

b.  Calculate  the  minimiim  value  for  the  upper  quartile. 


c.  Find  p(x=l^ 


(Solution  on  p.  325.) 


Exercise  7.8.5 

•  40%  of  full-time  students  took  4  years  to  graduate 

•  30%  of  full-time  students  took  5  years  to  graduate 

•  20%  of  full-time  students  took  6  years  to  graduate 

•  10%  of  full-time  students  took  7  years  to  graduate 

The  expected  time  for  full-time  students  to  graduate  is: 

A.  4  years 

B.  4.5  years 

C.  5  years 

D.  5.5  years 


(Solution  on  p.  325.) 


**This  content  is  available  online  at  <http: / /cnx.org/content/ml6955/1.12/>. 
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Exercise  7.8.6  (Solution  on  p.  325.) 

Which  of  the  following  distributions  is  described  by  the  following  example? 

Many  people  can  run  a  short  distance  of  under  2  miles,  but  as  the  distance  increases,  fewer  people 
can  run  that  far. 


A.  Binomial 

B.  Uniform 

C.  Exponential 

D.  Normal 


Exercise  7.8.7  (Solution  on  p.  325.) 

The  length  of  time  to  brush  one's  teeth  is  generally  thought  to  be  exponentially  distributed  with 
a  mean  of  |  minutes.  Find  the  probability  that  a  randomly  selected  person  brushes  his/her  teeth 
less  than  |  minutes. 

A.  0.5 

B.  i 

C.  0.43 

D.  0.63 


Exercise  7.8.8  (Solution  on  p.  325.) 

Which  distribution  acciirately  describes  the  following  situation? 

The  chance  that  a  teenage  boy  regularly  gives  his  mother  a  kiss  goodnight  (and  he  should!!)  is 
about  20%.  Fourteen  teenage  boys  are  randomly  surveyed. 

X  =the  niraiber  of  teenage  boys  that  regiilarly  give  their  mother  a  kiss  goodnight 

A.  6(14,0.20) 

B.  P(2.8) 

C.  N  (2.8, 2.24) 

D-  Exp  (gig) 

Exercise  7.8.9  (Solution  on  p.  325.) 

Which  distribution  accurately  describes  the  following  situation? 

A  2008  report  on  technology  use  states  that  approximately  20  percent  of  U.S.  households  have 
never  sent  an  e-mail,  (source:  http://www.webguild.org/2008/05/20-percent-of-americans- 
have-never-used-email.php)  Suppose  that  we  select  a  random  sample  of  foiurteen  U.S.  households 

X  =the  number  of  households  in  a  2008  sample  of  14  households  that  have  never  sent  an  email 

A.  6(14,0.20) 

B.  P(2.8) 

C.  N  (2.8, 2.24) 

Exp  (o^o) 


Exercise  9  contributed  by  Roberta  Bloom 
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7.9  Lab  1:  Central  Limit  Theorem  (Pocket  Change)^ 

Class  Time: 
Names: 

7.9.1  Student  Learning  Outcomes: 

•  The  student  will  demonstrate  and  compare  properties  of  the  Central  Limit  Theorem. 

NOTE:  This  lab  works  best  when  sampling  from  several  classes  and  combining  data. 

7.9.2  Collect  the  Data 

1.  Count  the  change  in  your  pocket.  (Do  not  include  bills.) 

2.  Randomly  survey  30  classmates.  Record  the  values  of  the  change. 


Table  7.1 

3.  Construct  a  histogram.  Make  5-6  intervals.  Sketch  the  graph  using  a  ruler  and  pencil.  Scale  the  axes. 
^This  content  is  available  onUne  at  <http:/ / cnx.org/ content/ ml6950/1.10/>. 
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Frequency 


Value  of  the  Change 


Figure  7.1 


4.  Calculate  the  foUowmg  (n  =  1;  surveying  one  person  at  a  time): 

a.  x  = 

b.  s  = 

5.  Draw  a  smooth  curve  through  the  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.9.3  Collecting  Averages  of  Pairs 

Repeat  steps  1  -  5  (of  the  section  above  titled  "Collect  the  Data")  with  one  exception.  Instead  of  recording 
the  change  of  30  classmates,  record  the  average  change  of  30  pairs. 

1.  Randomly  survey  30  pairs  of  classmates.  Record  the  values  of  the  average  of  their  change. 


Table  7.2 


2.  Construct  a  histogram.  Scale  the  axes  using  the  same  scaling  you  did  for  the  section  titled  "Collecting 
the  Data".  Sketch  the  graph  using  a  ruler  and  a  pencil. 
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Frequency 


Value  of  the  Change 


Figure  7.2 


3.  Calculate  the  following  (n  =  2;  surveying  two  people  at  a  time): 

a.  x  = 

b.  s  = 

4.  Draw  a  smooth  curve  through  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.9.4  Collecting  Averages  of  Groups  of  Five 

Repeat  steps  1-5  (of  the  section  titled  "Collect  the  Data")  with  one  exception.  Instead  of  recording  the 
change  of  30  classmates,  record  the  average  change  of  30  groups  of  5. 

1.  Randomly  survey  30  groups  of  5  classmates.     Record  the  values  of  the  average  of  their 
change. 


Table  7.3 
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2.  Construct  a  histogram.  Scale  the  axes  using  the  same  scaling  you  did  for  the  section  titled  "Collect  the 
Data".  Sketch  the  graph  using  a  ruler  and  a  pencil. 


Frequency 


Value  of  llie  Change 


Figure  7.3 


3.  Calculate  the  following  (n  =  5;  surveying  five  people  at  a  time): 

a.  x  = 

b.  s  = 

4.  Draw  a  smooth  curve  through  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.9.5  Discussion  Questions 

1 .  As  n  changed,  why  did  the  shape  of  the  distribution  of  the  data  change?  Use  1-2  complete  sentences 
to  explain  what  happened. 

2.  In  the  section  titled  "Collect  the  Data",  what  was  the  approximate  distribution  of  the  data?  X  ^ 

3.  In  the  section  titled  "Collecting  Averages  of  Groups  of  Five",  what  was  the  approximate  distribution 
of  the  averages?  X  ^ 

4.  In  1  -  2  complete  sentences,  explain  any  differences  in  your  answers  to  the  previous  two  questions. 
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7.10  Lab  2:  Central  Limit  Theorem  (Cookie  Recipes)^" 

Class  Time: 
Names: 

7.10.1  Student  Learning  Outcomes: 

•  The  student  will  demonstrate  and  compare  properties  of  the  Central  Limit  Theorem. 

7.10.2  Given: 

X  =  length  of  time  (in  days)  that  a  cookie  recipe  lasted  at  the  Oknstead  Homestead.  (Assume  that  each  of 
the  different  recipes  makes  the  same  quantity  of  cookies.) 


Recipe  # 

X 

Recipe  # 

X 

Recipe  # 

X 

Recipe  # 

X 

1 

1 

16 

2 

31 

3 

46 

2 

2 

5 

17 

2 

32 

4 

47 

2 

3 

2 

18 

4 

33 

5 

48 

11 

4 

5 

19 

6 

34 

6 

49 

5 

5 

6 

20 

1 

35 

6 

50 

5 

6 

1 

21 

6 

36 

1 

51 

4 

7 

2 

22 

5 

37 

1 

52 

6 

8 

6 

23 

2 

38 

2 

53 

5 

9 

5 

24 

5 

39 

1 

54 

1 

10 

2 

25 

1 

40 

6 

55 

1 

11 

5 

26 

6 

41 

1 

56 

2 

12 

1 

27 

4 

42 

6 

57 

4 

13 

1 

28 

1 

43 

2 

58 

3 

14 

3 

29 

6 

44 

6 

59 

6 

15 

2 

30 

2 

45 

2 

60 

5 

Table  7.4 


Calculate  the  following: 

a.  }ix  = 

b.  (7-;,;  = 

^"^This  content  is  available  online  at  <http://cnx.org/content/ml6945/l.ll/>. 
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7.10.3  Collect  the  Data 

Use  a  random  niiinber  generator  to  randomly  select  4  samples  of  size  n  —  5  from  the  given  popiilation. 
Record  yoiir  samples  below.  Then,  for  each  sample,  calculate  the  mean  to  the  nearest  tenth.  Record  them  in 
the  spaces  provided.  Record  the  sample  means  for  the  rest  of  the  class. 

1.  Complete  the  table: 


Sample  1 

Sample  2 

Sample  3 

Sample  4 

Sample  means  from  other  groups: 

Means: 

X  — 

X  = 

X  — 

X  — 

Table  7.5 


2.  Calculate  the  following: 

A.  X  — 

b.  sy  = 

3.  Again,  use  a  random  number  generator  to  randomly  select  4  samples  from  the  population.  This  time, 
make  the  samples  of  size  n  =  10.  Record  the  samples  below.  As  before,  for  each  sample,  calculate  the 
mean  to  the  nearest  tenth.  Record  them  in  the  spaces  provided.  Record  the  sample  means  for  the  rest 
of  the  class. 


Sample  1 

Sample  2 

Sample  3 

Sample  4 

Sample  means  from  other  groups: 

Means: 

X  — 

X  — 

X  — 

X  — 

Table  7.6 

4.  Calculate  the  following: 

A.  X  — 

b.  = 
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5.  For  the  original  population,  construct  a  histogram.  Make  intervals  with  bar  width  =  1  day.  Sketch  the 
graph  using  a  ruler  and  pencil.  Scale  the  axes. 


Frequency 


Time  (days) 


Figure  7.4 


6.  Draw  a  smooth  curve  through  the  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.10.4  Repeat  the  Procedure  for  n=5 

1 .  For  the  sample  of  n  =  5  days  averaged  together,  construct  a  histogram  of  the  averages  (your  means 
together  with  the  means  of  the  other  groups).  Make  intervals  with  bar  widths  =2 day.  Sketch  the 
graph  using  a  ruler  and  pencil.  Scale  the  axes. 
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Frequency 


Time  (days) 


Figure  7.5 


2.  Draw  a  smooth  curve  through  the  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.10.5  Repeat  the  Procedure  for  n=10 

1.  For  the  sample  of  n  =  10  days  averaged  together,  construct  a  histogram  of  the  averages  (your  means 
together  with  the  means  of  the  other  groups).  Make  intervals  with  bar  widths  =2 day.  Sketch  the 
graph  using  a  ruler  and  pencil.  Scale  the  axes. 
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Frequency 


Time  (days) 


Figure  7.6 


2.  Draw  a  smooth  curve  through  the  tops  of  the  bars  of  the  histogram.  Use  1-2  complete  sentences  to 
describe  the  general  shape  of  the  curve. 


7.10.6  Discussion  Questions 

1 .  Compare  the  three  histograms  you  have  made,  the  one  for  the  population  and  the  two  for  the  sample 
means.  In  three  to  five  sentences,  describe  the  similarities  and  differences. 

2.  State  the  theoretical  (according  to  the  CLT)  distributions  for  the  sample  means. 

a.  n  =  5:  X  ^ 

b.  M  =  10:  X~ 

3.  Are  the  sample  means  for  n  =  5  and  n  =  10  "close"  to  the  theoretical  mean,  fi^?  Explain  why  or  why 
not. 

4.  Which  of  the  two  distributions  of  sample  means  has  the  smaller  standard  deviation?  Why? 

5.  As  n  changed,  why  did  the  shape  of  the  distribution  of  the  data  change?  Use  1-2  complete  sentences 
to  explain  what  happened. 


NOTE:  This  lab  was  designed  and  contributed  by  Carol  Olmstead. 
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Solutions  to  Exercises  in  Chapter  7 

Solutions  to  Practice:  The  Central  Limit  Theorem 

Solution  to  Exercise  7.6.1  (p.  302) 

b.  3.5, 4.25,  0.2441 

Solution  to  Exercise  7.6.2  (p.  302) 

b.  0.7499 

Solution  to  Exercise  7.6.3  (p.  303) 

b.  4.49  hours 

Solution  to  Exercise  7.6.4  (p.  303) 
b.  0.3802 

Solution  to  Exercise  7.6.5  (p.  303) 
b:  71.90 

Solutions  to  Homework 
Solution  to  Exercise  7.7.1  (p.  305) 

b.  Xbar~N  (60,  ^) 

c.  0.5000 

d.  59.06 

e.  0.8536 

f.  0.1333 

h.  1530.35 

i.  0.8536 

Solution  to  Exercise  7.7.3  (p.  305) 

b.  1 

c.  34.31 

Solution  to  Exercise  7.7.5  (p.  306) 

b.  0.0808 

c.  256.01  feet 

Solution  to  Exercise  7.7.7  (p.  306) 

a.  The  total  length  of  time  for  9  criminal  trials 

b.  N  (189,21) 

c.  0.0432 

d.  162.09 

Solution  to  Exercise  7.7.9  (p.  306) 
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b.  0.6247 

c.  146.68 

d.  145  minutes 

Solution  to  Exercise  7.7.11  (p.  307) 

b.  !J  (10,25) 

c.  17.5 


f.  N  (17.5, 0.5839) 
h.  N  (962.5, 32.11) 
j.  0.0032 

Solution  to  Exercise  7.7.13  (p.  308) 


f.  0.9742 

g.  $52,330 

h.  $46,634 

Solution  to  Exercise  7.7.15  (p.  308) 


e.  N  (192,8.05) 
h.  Individual 

Solution  to  Exercise  7.7.17  (p.  309) 

b.  $20.71;  $17.31;  35 

d.  Exponential  distribution,  X  ~  Exp  (1/20.71) 

f.  $20.71;  $11.14 


Solution  to  Exercise  7.7.19  (p.  310) 

B 

Solution  to  Exercise  7.7.20  (p.  310) 

A 

Solution  to  Exercise  7.7.21  (p.  310) 

D 

Solution  to  Exercise  7.7.22  (p.  310) 
B 

Solution  to  Exercise  7.7.23  (p.  310) 

A 

Solution  to  Exercise  7.7.24  (p.  310) 

C 

Solution  to  Exercise  7.7.25  (p.  311) 

C.  0.0162 
E.  0.0268 

Solution  to  Exercise  7.7.26  (p.  311) 

A.  0.2047 

B.  0.9615 

C.  0.0938 


d. 


=  4.3301 
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Solutions  to  Review 

Solution  to  Exercise  7.8.1  (p.  312) 

A 

Solution  to  Exercise  7.8.2  (p.  312) 

B 

Solution  to  Exercise  7.8.3  (p.  312) 

A 

Solution  to  Exercise  7.8.4  (p.  312) 

a.  0.7165 

b.  4.16 

c.  0 

Solution  to  Exercise  7.8.5  (p.  312) 

C 

Solution  to  Exercise  7.8.6  (p.  313) 

C 

Solution  to  Exercise  7.8.7  (p.  313) 

D 

Solution  to  Exercise  7.8.8  (p.  313) 

A 

Solution  to  Exercise  7.8.9  (p.  313) 

A 
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Chapter  8 

Confidence  Intervals 


8.1  Confidence  Intervals^ 

8.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Calculate  and  interpret  confidence  intervals  for  one  population  mean  and  one  population  proportion. 

•  Interpret  the  student-t  probability  distribution  as  the  sample  size  changes. 

•  Discriminate  between  problems  applying  the  normal  and  the  student-t  distributions. 


8.1.2  Introduction 

Suppose  you  are  trying  to  determine  the  mean  rent  of  a  two-bedroom  apartment  in  your  town.  You  might 
look  in  the  classified  section  of  the  newspaper,  write  down  several  rents  listed,  and  average  them  together. 
You  would  have  obtained  a  point  estimate  of  the  true  mean.  If  you  are  trying  to  determine  the  percent  of 
times  you  make  a  basket  when  shooting  a  basketball,  you  might  count  the  number  of  shots  you  make  and 
divide  that  by  the  number  of  shots  you  attempted.  In  this  case,  you  would  have  obtained  a  point  estimate 
for  the  true  proportion. 

We  use  sample  data  to  make  generalizations  about  an  unknown  population.  This  part  of  statistics  is  called 
inferential  statistics.  The  sample  data  help  us  to  make  an  estimate  of  a  population  parameter.  We  realize 
that  the  point  estimate  is  most  likely  not  the  exact  value  of  the  population  parameter,  but  close  to  it.  After 
calculating  point  estimates,  we  construct  confidence  intervals  in  which  we  believe  the  parameter  lies. 

In  this  chapter,  you  will  learn  to  construct  and  interpret  confidence  intervals.  You  will  also  learn  a  new 
distribution,  the  Student's-t,  and  how  it  is  used  with  these  intervals.  Throughout  the  chapter,  it  is  important 
to  keep  in  mind  that  the  confidence  interval  is  a  random  variable.  It  is  the  parameter  that  is  fixed. 

If  you  worked  in  the  marketing  department  of  an  entertainment  company,  you  might  be  interested  in  the 
mean  number  of  compact  discs  (CD's)  a  consumer  buys  per  month.  If  so,  you  could  conduct  a  survey 
and  calculate  the  sample  mean,  x,  and  the  sample  standard  deviation,  s.  You  would  use  x  to  estimate 
the  population  mean  and  s  to  estimate  the  population  standard  deviation.  The  sample  mean,  x,  is  the 
point  estimate  for  the  population  mean,  }i.  The  sample  standard  deviation,  s,  is  the  point  estimate  for  the 
population  standard  deviation,  a. 


^This  content  is  available  online  at  <http://catx.org/content/ml6967/1.16/>. 
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Each  of  X  and  s  is  also  called  a  statistic. 

A  confidence  interval  is  another  t3^e  of  estimate  but,  instead  of  being  just  one  number,  it  is  an  interval 
of  numbers.  The  interval  of  numbers  is  a  range  of  values  calculated  from  a  given  set  of  sample  data.  The 
confidence  interval  is  likely  to  include  an  ujaknown  population  parameter. 

Suppose  for  the  CD  example  we  do  not  know  the  population  mean  but  we  do  know  that  the  population 
standard  deviation  is  =  1  and  our  sample  size  is  100.  Then  by  the  Central  Limit  Theorem,  the  standard 
deviation  for  the  sample  mean  is 


The  Empirical  Rule,  which  applies  to  bell-shaped  distributions,  says  that  in  approximately  95%  of  the 
samples,  the  sample  mean,  x,  will  be  within  two  standard  deviations  of  the  population  mean  fi.  For  our  CD 
example,  two  standard  deviations  is  (2)  (0.1)  =  0.2.  The  sample  mean  x  is  likely  to  be  within  0.2  units  of 

Because  x  is  within  0.2  units  of  }i,  which  is  unknown,  then  p  is  likely  to  be  within  0.2  units  of  ^  in  95% 
of  the  samples.  The  population  mean  }i  is  contained  in  an  interval  whose  lower  number  is  calculated  by 
taking  the  sample  mean  and  subtracting  two  standard  deviations  ((2)  (0.1))  and  whose  upper  number  is 
calcijlated  by  taking  the  sample  mean  and  adding  two  standard  deviations.  In  other  words,  }i  is  between 
X  -  0.2  and  ^  +  0.2  in  95%  of  all  the  samples. 

For  the  CD  example,  suppose  that  a  sample  produced  a  sample  mean  x  —  2.  Then  the  unknown  popiilation 
mean  ji  is  between 

^  -  0.2  =  2  -  0.2  =  1.8  and  ^  +  0.2  =  2  +  0.2  =  2.2 

We  say  that  we  are  95%  confident  that  the  unknown  population  mean  niraiber  of  CDs  is  between  1.8  and 
2.2.  The  95%  confidence  interval  is  (1.8, 2.2). 

The  95%  confidence  interval  implies  two  possibilities.  Either  the  interval  (1.8,  2.2)  contains  the  true  mean 
or  our  sample  produced  an  x  that  is  not  within  0.2  units  of  the  true  mean  }i.  The  second  possibility  happens 
for  only  5%  of  all  the  samples  (100%  -  95%). 

Remember  that  a  confidence  interval  is  created  for  an  unknown  population  parameter  like  the  population 
mean,  ji.  Confidence  intervals  for  some  parameters  have  the  form 

(point  estimate  -  margin  of  error,  point  estimate  +  margin  of  error) 

The  margin  of  error  depends  on  the  confidence  level  or  percentage  of  confidence. 

When  you  read  newspapers  and  journals,  some  reports  will  use  the  phrase  "margin  of  error."  Other  reports 
will  not  use  that  phrase,  but  include  a  confidence  interval  as  the  point  estimate  +  or  -  the  margin  of  error. 
These  are  two  ways  of  expressing  the  same  concept. 

NOTE:  Although  the  text  only  covers  symmetric  confidence  intervals,  there  are  non-sjonmetric 
confidence  intervals  (for  example,  a  confidence  interval  for  the  standard  deviation). 


8.1.3  Optional  Collaborative  Classroom  Activity 

Have  your  instructor  record  the  number  of  meals  each  student  in  your  class  eats  out  in  a  week.  Assume 
that  the  standard  deviation  is  known  to  be  3  meals.  Construct  an  approximate  95%  confidence  interval  for 
the  true  mean  number  of  meals  students  eat  out  each  week. 
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1.  Calculate  the  sample  mean. 

1.  a  —  ?>  and  n  —  the  number  of  students  surveyed. 

3.  Construct  the  interval  (x-l--^,x  +  l-^\ 

We  say  we  are  approximately  95%  confident  that  the  true  average  number  of  meals  that  students  eat  out  in 
a  week  is  between  and  . 


8.2  Confidence  Interval,  Single  Population  Mean,  Population  Standard 
Deviation  Known,  NormaP 

8.2.1  Calculating  the  Confidence  Interval 

To  construct  a  confidence  interval  for  a  single  unknown  population  mean  ji  ,  where  the  population  stan- 
dard deviation  is  known,  we  need  x  as  an  estimate  for  }i  and  we  need  the  margin  of  error.  Here,  the 
margin  of  error  is  called  the  error  bound  for  a  population  mean  (abbreviated  EBM).  The  sample  mean  x  is 
the  point  estimate  of  the  unknown  population  mean  }i 

The  confidence  interval  estimate  will  have  the  form: 

(point  estimate  -  error  bound,  point  estimate  +  error  bound)  or,  in  symbols,(x  —  EBM,  x  +  EBM) 

The  margin  of  error  depends  on  the  confidence  level  (abbreviated  CL).  The  confidence  level  is  often  con- 
sidered the  probability  that  the  calculated  confidence  interval  estimate  will  contain  the  true  population 
parameter.  However,  it  is  more  accurate  to  state  that  the  confidence  level  is  the  percent  of  confidence  in- 
tervals that  contain  the  true  population  parameter  when  repeated  samples  are  taken.  Most  often,  it  is  the 
choice  of  the  person  constructing  the  confidence  interval  to  choose  a  confidence  level  of  90%  or  higher 
because  that  person  wants  to  be  reasonably  certain  of  his  or  her  conclusions. 

There  is  another  probability  called  alpha  (a),  a  is  related  to  the  confidence  level  CL.  oc  is  the  probability  that 
the  interval  does  not  contain  the  imknown  population  parameter. 
Mathematically,  oc  +  CL  =  1. 

Example  8.1 

Suppose  we  have  collected  data  from  a  sample.  We  know  the  sample  mean  but  we  do  not  know 

the  mean  for  the  entire  population. 
The  sample  mean  is  7  and  the  error  bound  for  the  mean  is  2.5. 

^  =  7  and  EBM  =  2.5. 

The  confidence  interval  is  (7  —  2.5, 7  +  2.5);  calculating  the  values  gives  (4.5, 9.5). 

If  the  confidence  level  (CL)  is  95%,  then  we  say  that  "We  estimate  with  95%  confidence  that  the 
true  value  of  the  population  mean  is  between  4.5  and  9.5." 

A  confidence  interval  for  a  population  mean  with  a  known  standard  deviation  is  based  on  the  fact  that  the 
sample  means  follow  an  approximately  normal  distiibution.  Suppose  that  our  sample  has  a  mean  of  x  =  10 
and  we  have  constructed  the  90%  confidence  interval  (5, 15)  where  EBM  —  5. 

To  get  a  90%  confidence  interval,  we  must  include  the  central  90%  of  the  probability  of  the  normal  distri- 
bution. If  we  include  the  central  90%,  we  leave  out  a  total  of  a  =  10%  in  both  tails,  or  5%  in  each  tail,  of  the 
normal  distribution. 

^This  content  is  available  online  at  <http://cnx.org/content/ml6962/1.23/>. 
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Confidence  Level  (CL)  =  0.90 


EBM  =  5 


x=  10 


x  +  EBM 


X  -  EBM 


5 


15 


IX  is  believed  to  be  in  the  interval  (5,  15)  with  90%  confidence. 

To  capture  the  central  90%,  we  must  go  out  1.645  "standard  deviations"  on  either  side  of  the  calculated 
sample  mean.  1.645  is  the  z-score  from  a  Standard  Normal  probability  distribution  that  puts  an  area  of  0.90 
in  the  center,  an  area  of  0.05  in  the  far  left  tail,  and  an  area  of  0.05  in  the  far  right  tail. 

It  is  important  that  the  "standard  deviation"  used  must  be  appropriate  for  the  parameter  we  are  estimating. 
So  in  this  section,  we  need  to  use  the  standard  deviation  that  applies  to  sample  means,  which  is  ^  .  ^  is 
commonly  called  the  "standard  error  of  the  mean"  in  order  to  clearly  distinguish  the  standard  deviation  for 
a  mean  from  the  population  standard  deviation  a. 

In  summary,  as  a  result  of  the  Central  Limit  Theorem: 


•  When  the  population  standard  deviation  a  is  known,  we  use  a  Normal  distribution  to  calculate 
the  error  bound. 

Calculating  the  Confidence  Interval: 

To  construct  a  confidence  interval  estimate  for  an  unknown  population  mean,  we  need  data  from  a  random 
sample.  The  steps  to  construct  and  interpret  the  confidence  interval  are: 

•  Calculate  the  sample  mean  x  from  the  sample  data.  Remember,  in  this  section,  we  already  know  the 
population  standard  deviation  cr. 

•  Find  the  Z-score  that  corresponds  to  the  confidence  level. 

•  Calculate  the  error  bound  EBM 

•  Construct  the  confidence  interval 

•  Write  a  sentence  that  interprets  the  estimate  in  the  context  of  the  situation  in  the  problem.  (Explain 
what  the  confidence  interval  means,  in  the  words  of  the  problem.) 

We  will  first  examine  each  step  in  more  detail,  and  then  illustrate  the  process  with  some  examples. 

Finding  z  for  the  stated  Confidence  Level 

When  we  know  the  population  standard  deviation  a,  we  use  a  standard  normal  distribution  to  calculate 
the  error  bound  EBM  and  construct  the  confidence  interval.  We  need  to  find  the  value  of  z  that  puts  an  area 
equal  to  the  confidence  level  (in  decimal  form)  in  the  middle  of  the  standard  normal  distribution  Z'~-^N(0,1). 

The  confidence  level,  CL,  is  the  area  in  the  middle  of  the  standard  normal  distribution.  CL  =  1  —  a.  So  a  is 
the  area  that  is  split  equally  between  the  two  tails.  Each  of  the  tails  contains  an  area  equal  to  j  . 

The  z-score  that  has  an  area  to  the  right  of  j  is  denoted  by  z » 
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For  example,  when  CL  =  0.95  then  a  =  0.05  and  f  =  0.025  ;  we  write      —  z  025 
The  area  to  the  right  of  z.025  is  0.025  and  the  area  to  the  left  of  z.025  is  1-0.025  =  0.975 
Z|  =  zo.025  =  1.96  ,  using  a  calculator,  computer  or  a  Standard  Normal  probability  table. 
Using  the  TI83,  TI83+  or  TI84+  calculator:  invNorm(0.975, 0, 1)  =  1.96 

CALCULATOR  NOTE:  Remember  to  use  area  to  the  LEFT  of  z  |  ;  in  this  chapter  the  last  two  inputs  in  the 
rnvNorm  command  are  0,1  because  you  are  using  a  Standard  Normal  Distribution  Z'~N(0,1) 

EBM:  Error  Bound 

The  error  bound  formula  for  an  unknown  population  mean  }i  when  the  population  standard  deviation  a  is 
known  is 

•  EBM  =  z«  ■  ^ 

2 

Constructing  the  Confidence  Interval 

•  The  confidence  interval  estimate  has  the  format  (x  —  EBM, x  +  EBM). 
The  graph  gives  a  picture  of  the  entire  situation. 

CL  +  f  +  f  =  CL  +  a  =  1. 

a  a 

T  CL  =  1  -  tx  ~ 


X  -  EBM    s     X  +  EBM 

Writing  the  Interpretation 

The  interpretation  should  clearly  state  the  confidence  level  (CL),  explain  what  population  parameter  is 
being  estimated  (here,  a  population  mean),  and  should  state  the  confidence  interval  (both  endpornts).  "We 

estimate  with  %  confidence  that  the  true  population  mean  (include  context  of  the  problem)  is  between 

 and  (include  appropriate  units)." 

Example  8.2 

Suppose  scores  on  exams  in  statistics  are  normally  distributed  with  an  unknown  population  mean 
and  a  population  standard  deviation  of  3  points.  A  random  sample  of  36  scores  is  taken  and  gives 
a  sample  mean  (sample  mean  score)  of  68.  Find  a  confidence  interval  estimate  for  the  population 
mean  exam  score  (the  mean  score  on  all  exams). 

Problem 

Find  a  90%  confidence  interval  for  the  true  (population)  mean  of  statistics  exam  scores. 
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Solution 

•  You  can  use  technology  to  directly  calculate  the  confidence  interval 

•  The  first  solution  is  shown  step-by-step  (Solution  A). 

•  The  second  solution  uses  the  TI-83,  83+  and  84+  calciilators  (Solution  B). 

Solution  A 

To  find  the  confidence  interval,  you  need  the  sample  mean,  x,  and  the  EBM. 

J  =  68 

(7  =  3  ;  n  =  36  ;  The  confidence  level  is  90%  (CL=0.90) 
CL  =  0.90  soa  =  l-CL  =  l-  0.90  =  0.10 
I  —  0.05      zx  —  Z.05 

The  area  to  the  right  of  z.05  is  0.05  and  the  area  to  the  left  of  z.05  is  1—0.05=0.95 
za  =  Z.05  —  1-645 

using  invNorm(0.95,0,l)  on  the  TI-83,83+,84+  calculators.  This  can  also  be  found  using  appropriate 
commands  on  other  calciilators,  using  a  computer,  or  using  a  probability  table  for  the  Standard 
Normal  distribution. 

EBM  =  1.645  •  =  0.8225 

X  -  EBM  =  68  -  0.8225  =  67.1775 

X  +  EBM  =  68  +  0.8225  =  68.8225 

The  90%  confidence  interval  is  (67.1775,  68.8225). 

Solution  B 

Using  a  function  of  the  TI-83,  TI-83+  or  TI-84  calculators: 

Press  STAT  and  arrow  over  to  TESTS. 
Arrow  down  to  7 :  ZInterval. 
Press  ENTER. 

Arrow  to  Stats  and  press  ENTER. 

Arrow  down  and  enter  3  for  a,  68  for  x  ,  36  for  n,  and  .90  for  C-level. 

Arrow  down  to  Calculate  and  press  ENTER. 

The  confidence  interval  is  (to  3  decimal  places)  (67.178, 68.822). 

Interpretation 

We  estimate  with  90%  confidence  that  the  true  population  mean  exam  score  for  all  statistics  stu- 
dents is  between  67.18  and  68.82. 

Explanation  of  90%  Confidence  Level 

90%  of  all  confidence  intervals  constructed  in  this  way  contain  the  true  mean  statistics  exam  score. 
For  example,  if  we  constructed  100  of  these  confidence  intervals,  we  would  expect  90  of  them  to 
contain  the  true  population  mean  exam  score. 
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8.2.2  Changing  the  Confidence  Level  or  Sample  Size 
Example  8.3:  Changing  the  Confidence  Level 

Suppose  we  change  the  original  problem  by  using  a  95%  confidence  level.  Find  a  95%  confidence 
interval  for  the  true  (population)  mean  statistics  exam  score. 

Solution 

To  find  the  confidence  interval,  you  need  the  sample  mean,  x,  and  the  EBM. 

x  =  68 

cr  =  3 ;  n  =  36 ;  The  confidence  level  is  95%  (CL=0.95) 

CL  =  0.95  soa  =  l-CL  =  l-  0.95  =  0.05 
I  =  0.025      Z|  =  Z.025 

The  area  to  the  right  of  z.025  is  0.025  and  the  area  to  the  left  of  z.025  is  1—0.025=0.975 
Zf  =  z.025  =  1-96 

using  invnorm(.975,0,l)  on  the  TI-83,83+,84+  calculators.  (This  can  also  be  found  using  appropri- 
ate commands  on  other  calculators,  using  a  computer,  or  using  a  probability  table  for  the  Standard 
Normal  distribution.) 

E^^-l-96-(^)-0.98 

X  -  EBM  =  68  -  0.98  =  67.02 

X  +  EBM  =  68  +  0.98  =  68.98 

Interpretation 

We  estimate  with  95  %  confidence  that  the  true  population  mean  for  aU  statistics  exam  scores  is 
between  67.02  and  68.98. 

Explanation  of  95%  Confidence  Level 

95%  of  all  confidence  intervals  constructed  in  this  way  contain  the  true  value  of  the  population 
mean  statistics  exam  score. 

Comparing  the  results 

The  90%  confidence  interval  is  (67.18,  68.82).  The  95%  confidence  interval  is  (67.02,  68.98).  The 
95%  confidence  interval  is  wider.  If  you  look  at  the  graphs,  because  the  area  0.95  is  larger  than  the 
area  0.90,  it  makes  sense  that  the  95%  confidence  interval  is  wider. 
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0.025  0.95  0.025 

(a)  (b) 
Figure  8.1 

Summary:  Effect  of  Changing  the  Confidence  Level 

•  Increasing  the  confidence  level  increases  the  error  bound,  making  the  confidence  interval 
wider. 

•  Decreasing  the  confidence  level  decreases  the  error  bound,  making  the  confidence  interval 
narrower. 

Example  8.4:  Changing  the  Sample  Size: 

Suppose  we  change  the  original  problem  to  see  what  happens  to  the  error  bound  if  the  sample  size 
is  changed. 

Problem 

Leave  everything  the  same  except  the  sample  size.  Use  the  original  90%  confidence  level.  What 
happens  to  the  error  bound  and  the  confidence  interval  if  we  increase  the  sample  size  and  use 
n=100  instead  of  n=36?  What  happens  if  we  decrease  the  sample  size  to  n=25  instead  of  n=36? 

•  X  =  68 

•  C7  =  3  ;  The  confidence  level  is  90%  (CL=0.90) ;  Z|  =  z.05  =  1.645 
Solution  A 

If  we  increase  the  sample  size  n  to  100,  we  decrease  the  error  bound. 

When  M  =  100  :  EBM  =  z.  ■  (^)  =  1.645  •  f  =  0.4935 

2   \V"J  \VwoJ 

Solution  B 

If  we  decrease  the  sample  size  n  to  25,  we  increase  the  error  bound. 
When  M  =  25  :  EBM  =  z.  ■  (^)  =  1.645  ■  (^)  =  0.987 


Summary:  Effect  of  Changing  the  Sample  Size 

•  Increasing  the  sample  size  causes  the  error  bound  to  decrease,  making  the  confidence  inter- 
val narrower. 

•  Decreasing  the  sample  size  causes  the  error  bound  to  increase,  making  the  confidence  inter- 
val wider. 
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8.2.3  Working  Backwards  to  Find  the  Error  Bound  or  Sample  Mean 
Working  Bacwards  to  find  the  Error  Bound  or  the  Sample  Mean 

When  we  calculate  a  confidence  interval,  we  find  the  sample  mean  and  calculate  the  error  boimd  and  use 
them  to  calculate  the  confidence  interval.  But  sometimes  when  we  read  statistical  studies,  the  study  may 
state  the  confidence  interval  only.  If  we  know  the  confidence  interval,  we  can  work  backwards  to  find  both 
the  error  bound  and  the  sample  mean. 

Finding  the  Error  Bound 

•  From  the  upper  value  for  the  interval,  subtract  the  sample  mean 

•  OR,  From  the  upper  value  for  the  interval,  subtract  the  lower  value.  Then  divide  the  difference  by  2. 

Finding  the  Sample  Mean 

•  Subtract  the  error  bound  from  the  upper  value  of  the  confidence  interval 

•  OR,  Average  the  upper  and  lower  endpoints  of  the  confidence  interval 

Notice  that  there  are  two  methods  to  perform  each  calciilation.  You  can  choose  the  method  that  is  easier  to 
use  with  the  information  you  know. 

Example  8.5 

Suppose  we  know  that  a  confidence  interval  is  (67.18, 68.82)  and  we  want  to  find  the  error  bound. 
We  may  know  that  the  sample  mean  is  68.  Or  perhaps  oui  source  only  gave  the  confidence  interval 
and  did  not  tell  us  the  value  of  the  the  sample  mean. 

Calculate  the  Error  Bound: 

•  If  we  know  that  the  sample  mean  is  68:  EBM  =  68.82  -  68  =  0.82 

•  If  we  don't  know  the  sample  mean:  EBM  =  (^8.82-67.18)  ^  q  g2 

Calculate  the  Sample  Mean: 

•  If  we  know  the  error  bound:  x  =  68.82  —  0.82  =  68 

•  If  we  don't  know  the  error  bound:  x  =   (67.18+68.82)  ^ 


8.2.4  Calculating  the  Sample  Size  n 

If  researchers  desire  a  specific  margin  of  error,  then  they  can  use  the  error  bound  formula  to  calculate  the 
required  sample  size. 

The  error  bound  formiila  for  a  population  mean  when  the  population  standard  deviation  is  known  is 

2  2 

The  formula  for  sample  size  is  n  =  found  by  solving  the  error  bound  formula  for  n 

In  this  formula,  z  is  z « ,  corresponding  to  the  desired  confidence  level.  A  researcher  planning  a  study  who 
wants  a  specified  confidence  level  and  error  boimd  can  use  this  formula  to  calculate  the  size  of  the  sample 

needed  for  the  study. 

Example  8.6 

The  population  standard  deviation  for  the  age  of  Foothill  College  students  is  15  years.  If  we 
want  to  be  95%  confident  that  the  sample  mean  age  is  within  2  years  of  the  true  population  mean 
age  of  Foothill  CoUege  students ,  how  many  randomly  selected  Foothill  College  students  must  be 
surveyed? 
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From  the  problem,  we  know  that  a  —  15  and  EBM=2 
z  —  Z.025  —  1-96,  because  the  confidence  level  is  95%. 

n  —         =  ^'^2^^^^  =216.09  using  the  sample  size  equation. 

Use  M  =  217:  Always  round  the  answer  UP  to  the  next  higher  integer  to  ensiire  that  the  sample 
size  is  large  enough. 

Therefore,  217  Foothill  College  students  should  be  surveyed  in  order  to  be  95%  confident  that  we 
are  within  2  years  of  the  true  population  mean  age  of  Foothill  College  students. 

**With  contributions  from  Roberta  Bloom 

8.3  Confidence  Interval,  Single  Population  Mean,  Standard  Deviation 
Unknown,  Student's-t^ 

In  practice,  we  rarely  know  the  population  standard  deviation.  In  the  past,  when  the  sample  size  was  large, 
this  did  not  present  a  problem  to  statisticians.  They  used  the  sample  standard  deviation  s  as  an  estimate 
for  a  and  proceeded  as  before  to  calculate  a  confidence  interval  with  close  enough  resiilts.  However, 
statisticians  ran  into  problems  when  the  sample  size  was  small.  A  small  sample  size  caused  inaccuracies  in 

the  confidence  interval. 

William  S.  Gossett  (1876-1937)  of  the  Guinness  brewery  in  Dublin,  Ireland  ran  into  this  problem.  His  exper- 
iments with  hops  and  barley  produced  very  few  samples.  Just  replacing  a  with  s  did  not  produce  accurate 
results  when  he  tried  to  calculate  a  confidence  interval.  He  realized  that  he  could  not  use  a  normal  distri- 
bution for  the  calculation;  he  found  that  the  actual  distribution  depends  on  the  sample  size.  This  problem 
led  him  to  "discover"  what  is  called  the  Student's-t  distribution.  The  name  comes  from  the  fact  that  Gosset 
wrote  under  the  pen  name  "Student." 

Up  until  the  mid  1970s,  some  statisticians  used  the  normal  distribution  approximation  for  large  sample 
sizes  and  only  used  the  Student's-t  distribution  for  sample  sizes  of  at  most  30.  With  the  common  use  of 
graphing  calculators  and  computers,  the  practice  is  to  use  the  Student's-t  distribution  whenever  s  is  used 
as  an  estimate  for  tr. 

If  you  draw  a  simple  random  sample  of  size  n  from  a  population  that  has  approximately  a  normal  distri- 
bution with  mean  ^  and  imknown  population  standard  deviation  cr  and  calculate  the  t-score  t  —  , 

then  the  t-scores  follow  a  Student's-t  distribution  with  n  —  1  degrees  of  freedom.  The  t-score  has  the  same 
interpretation  as  the  z-score.  It  measures  how  far  x  is  from  its  mean.  }i.  For  each  sample  size  n,  there  is  a 
different  Student's-t  distribution. 

The  degrees  of  freedom,  n  —  1,  come  from  the  calculation  of  the  sample  standard  deviation  s.  In  Chapter 
2,  we  used  n  deviations  {x  —  x  values)  to  calculate  s.  Because  the  sum  of  the  deviations  is  0,  we  can  find 
the  last  deviation  once  we  know  the  other  n  —  1  deviations.  The  other  n  —  1  deviations  can  change  or  vary 
freely.  We  call  the  number  n  —  1  the  degrees  of  freedom  (df). 

Properties  of  the  Student's-t  Distribution 

•  The  graph  for  the  Student's-t  distribution  is  similar  to  the  Standard  Normal  curve. 

•  The  mean  for  the  Student's-t  distribution  is  0  and  the  distribution  is  symmetric  about  0. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml6959/l.24/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


337 


•  The  Student 's-t  distribution  has  more  probability  in  its  tails  than  the  Standard  Normal  distribution 
because  the  spread  of  the  t  distribution  is  greater  than  the  spread  of  the  Standard  Normal.  So  the 
graph  of  the  Student's-t  distribution  will  be  thicker  in  the  tails  and  shorter  in  the  center  than  the 
graph  of  the  Standard  Normal  distribution. 

•  The  exact  shape  of  the  Student's-t  distribution  depends  on  the  "degrees  of  freedom".  As  the  degrees 
of  freedom  increases,  the  graph  Student's-t  distribution  becomes  more  like  the  graph  of  the  Standard 
Normal  distribution. 

•  The  underlying  population  of  individual  observations  is  assumed  to  be  normally  distributed  with 
unknown  population  mean  ji  and  unknown  population  standard  deviation  a.  The  size  of  the  under- 
lying population  is  generally  not  relevant  unless  it  is  very  small.  If  it  is  bell  shaped  (normal)  then  the 
assumption  is  met  and  doesn't  need  discussion.  Random  sampling  is  assumed  but  it  is  a  completely 
separate  assiraiption  from  normality. 

Calculators  and  computers  can  easily  calculate  any  Student's-t  probabilities.  The  Tl-83,83+,84+  have  a  tcdf 
function  to  find  the  probability  for  given  values  of  t.  The  grairanar  for  the  tcdf  command  is  tcdf(lower 
bound,  upper  bound,  degrees  of  freedom).  However  for  confidence  intervals,  we  need  to  use  inverse 
probability  to  find  the  value  of  t  when  we  know  the  probability. 

For  the  TI-84+  you  can  use  the  invT  command  on  the  DISTRibution  menu.  The  invT  command  works 
similarly  to  the  invnorm.  The  invT  command  requires  two  inputs:  invT(area  to  the  left,  degrees  of 
freedom)  The  output  is  the  t-score  that  corresponds  to  the  area  we  specified. 

The  TI-83  and  83+  do  not  have  the  invT  command.  (The  TI-89  has  an  inverse  T  command.) 

A  probability  table  for  the  Student's-t  distribution  can  also  be  used.  The  table  gives  t-scores  that  correspond 
to  the  confidence  level  (coliimn)  and  degrees  of  freedom  (row).  (The  TI-86  does  not  have  an  invT  program 
or  command,  so  if  you  are  using  that  calculator,  you  need  to  use  a  probability  table  for  the  Student's-t  distri- 
bution.) When  using  t-table,  note  that  some  tables  are  formatted  to  show  the  confidence  level  in  the  column 
headings,  while  the  coliimn  headings  in  some  tables  may  show  only  corresponding  area  in  one  or  both  tails. 

A  Student's-t  table  (See  the  Table  of  Contents  15.  Tables)  gives  t-scores  given  the  degrees  of  free- 
dom and  the  right-tailed  probability.  The  table  is  very  limited.  Calculators  and  computers  can  easily 
calculate  any  Student's-t  probabilities. 

The  notation  for  the  Student's-t  distribution  is  (using  T  as  the  random  variable)  is 

•  T  ^  tdi  where  df  =  n  —  1. 

•  For  example,  if  we  have  a  sample  of  size  n=20  items,  then  we  calculate  the  degrees  of  freedom  as 

df=n— 1=20— 1=19  and  we  write  the  distribution  as  T  tig 

If  the  population  standard  deviation  is  not  known,  the  error  bound  for  a  population  mean  is: 

•  i  I  is  the  t-score  with  area  to  the  right  equal  to| 

•  use  df  =  n  —  1  degrees  of  freedom 

•  s  =  sample  standard  deviation 

The  format  for  the  confidence  interval  is: 

{x-EBM,x  +  EBM). 

The  TI-83,  83+  and  84  calciilators  have  a  function  that  calciilates  the  confidence  interval  directly.  To  get  to 
it. 

Press  STAT 
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Arrow  over  to  TESTS. 

Arrow  down  to  8 :  TInterval  and  press  ENTER  (or  just  press  8). 
Example  8.7 

Suppose  you  do  a  study  of  acupuncture  to  determine  how  effective  it  is  in  relieving  pain. 
You  measure  sensory  rates  for  15  subjects  with  the  results  given  below.  Use  the  sample  data 
to  construct  a  95%  confidence  interval  for  the  mean  sensory  rate  for  the  popiilation  (assumed 
normal)  from  which  you  took  the  data. 

The  solution  is  shown  step-by-step  and  by  using  the  TI-83,  83+  and  84+  calculators. 
8.6;  9.4;  7.9;  6.8;  8.3;  7.3;  9.2;  9.6;  8.7;  11.4;  10.3;  5.4;  8.1;  5.5;  6.9 

Solution 

•  You  can  use  technology  to  directly  calculate  the  confidence  interval. 

•  The  first  solution  is  step-by-step  (Solution  A). 

•  The  second  solution  uses  the  Ti-83+  and  Ti-84  calculators  (Solution  B). 
Solution  A 

To  find  the  confidence  interval,  you  need  the  sample  mean,  x,  and  the  EBM. 
X  =  8.2267      s  =  1.6722      n  =  15 

df  =  15  -  1  =  14 

CL  =  0.95  so  a  =  1-CL  =  1-0.95  =  0.05 
I  =  0.025      t|  =  f.o25 

The  area  to  the  right  of  tms  is  0.025  and  the  area  to  the  left  of  t.025  is  1-0.025=0.975 
=  1 025  =  2.14  using  invT(.975,14)  on  the  TI-84+  calciilator. 


X  -  EBM  =  8.2267  -  0.9240  =  7.3 

X  +  EBM  =  8.2267  +  0.9240  =  9.15 

The  95%  confidence  interval  is  (7.30, 9.15). 

We  estimate  with  95%  confidence  that  the  true  population  mean  sensory  rate  is  between  7.30  and 
9.15. 


Solution  B 

Using  a  function  of  the  TI-83,  TI-83+  or  TI-84  calculators: 

Press  STAT  and  arrow  over  to  TESTS. 

Arrow  down  to  8 :  TInterval  and  press  ENTER  (or  you  can  just  press  8).  Arrow  to  Data  and  press 
ENTER. 

Arrow  down  to  List  and  enter  the  list  name  where  you  put  the  data. 
Arrow  down  to  Freq  and  enter  1. 


EBM  =  2.14  • 


=  0.924 
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Arrow  down  to  C-level  and  enter  .95 
Arrow  down  to  Calculate  and  press  ENTER. 
The  95%  confidence  interval  is  (7.3006, 9.1527) 

NOTE:  When  calculating  the  error  bound,  a  probability  table  for  the  Student's-t  distribution  can 

also  be  used  to  find  the  value  of  t.  The  table  gives  t-scores  that  correspond  to  the  confidence  level 
(column)  and  degrees  of  freedom  (row);  the  t-score  is  found  where  the  row  and  column  intersect 
in  the  table. 

**With  contributions  from  Roberta  Bloom 


8.4  Confidence  Interval  for  a  Population  Proportion^ 

During  an  election  year,  we  see  articles  in  the  newspaper  that  state  confidence  intervals  in  terms  of  pro- 
portions or  percentages.  For  example,  a  poll  for  a  particular  candidate  running  for  president  might  show 
that  the  candidate  has  40%  of  the  vote  within  3  percentage  points.  Often,  election  polls  are  calculated  with 
95%  confidence.  So,  the  pollsters  would  be  95%  confident  that  the  true  proportion  of  voters  who  favored 
the  candidate  would  be  between  0.37  and  0.43  :  (0.40  -  0.03, 0.40  +  0.03). 

Investors  in  the  stock  market  are  interested  in  the  true  proportion  of  stocks  that  go  up  and  down  each  week. 
Businesses  that  sell  personal  computers  are  interested  in  the  proportion  of  households  in  the  United  States 
that  own  personal  computers.  Confidence  intervals  can  be  calculated  for  the  true  proportion  of  stocks  that 
go  up  or  down  each  week  and  for  the  true  proportion  of  households  in  the  United  States  that  own  personal 
computers. 

The  procedure  to  find  the  confidence  interval,  the  sample  size,  the  error  bound,  and  the  confidence  level 
for  a  proportion  is  similar  to  that  for  the  popiilation  mean.  The  formulas  are  different. 

How  do  you  know  you  are  dealing  with  a  proportion  problem?  First,  the  underlying  distribution  is 

binomial.  (There  is  no  mention  of  a  mean  or  average.)  If  X  is  a  binomial  random  variable,  then  X  ^  B  (n,  p) 
where  n  =  the  niimber  of  trials  and  p  =  the  probability  of  a  success.  To  form  a  proportion,  take  X,  the 
random  variable  for  the  number  of  successes  and  divide  it  by  n,  the  number  of  trials  (or  the  sample  size). 
The  random  variable  P'  (read  "P  prime")  is  that  proportion. 


(Sometimes  the  random  variable  is  denoted  as  P,  read  "P  hat".) 

When  n  is  large  and  p  is  not  close  to  0  or  1,  we  can  use  the  normal  distribution  to  approximate  the  binomial. 

X  ~  N  (n  •  p,      ■  p  ■  q) 

If  we  divide  the  random  variable  by  n,  the  mean  by  n,  and  the  standard  deviation  by  n,  we  get  a  normal 
distribution  of  proportions  with  P',  called  the  estimated  proportion,  as  the  random  variable.  (Recall  that  a 
proportion  =  the  niunber  of  successes  divided  by  n.) 


p'  _  X 
n 


Using  algebra  to  simplify  : 


*This  content  is  available  online  at  <http://aix.org/content/ml6963/1.20/>. 
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P'  follows  a  nonnal  distribution  for  proportions:  P'  ~  N 

The  confidence  interval  has  the  form  {p'  —  EBP,  p'  +  EBP). 

P'=-n 

p'  =  the  estimated  proportion  of  successes  {p'  is  a  point  estimate  for  p,  the  true  proportion) 

X  =  the  number  of  successes. 

n  =  the  size  of  the  sample 

The  error  bound  for  a  proportion  is 

EBP  —     ■  'J whereq'  —  1  —  p' 


This  formula  is  similar  to  the  error  bound  formula  for  a  mean,  except  that  the  "appropriate  standard  devia- 
tion" is  different.  For  a  mean,  when  the  population  standard  deviation  is  known,  the  appropriate  standard 

deviation  that  we  use  is  ^ .  For  a  proportion,  the  appropriate  standard  deviation  is  ^J^. 
However,  in  the  error  bound  formula,  we  use  as  the  standard  deviation,  instead  of 

However,  in  the  error  boimd  formula,  the  standard  deviation  is  \J~^-^- 

In  the  error  bound  formula,  the  sample  proportions  p'  and  q'  are  estimates  of  the  unknown  population 
proportions  p  and  q.  The  estimated  proportions  p'  and  q'  are  used  because  p  and  q  are  not  known,  p'  and 
q'  are  calciilated  from  the  data,  p'  is  the  estimated  proportion  of  successes,  q'  is  the  estimated  proportion  of 
failures. 

The  confidence  interval  can  only  be  used  if  the  number  of  successes  np'  and  the  nimaber  of  failiures  nq'  are 
both  larger  than  5. 

NOTE:  For  the  normal  distribution  of  proportions,  the  z-score  formiila  is  as  foUows. 

If  P'     N  I  p,  w  ^  J  then  the  z-score  formula  is  z  =  -2-^ 


Example  8.8 

Suppose  that  a  market  research  firm  is  hired  to  estimate  the  percent  of  adults  living  in  a  large 
city  who  have  cell  phones.  500  randomly  selected  adult  residents  in  this  city  are  surveyed  to 
determine  whether  they  have  cell  phones.  Of  the  500  people  surveyed,  421  responded  yes  -  they 
own  cell  phones.  Using  a  95%  confidence  level,  compute  a  confidence  interval  estimate  for  the 
true  proportion  of  adults  residents  of  this  city  who  have  cell  phones. 

Solution 

•  You  can  use  technology  to  directly  calculate  the  confidence  interval. 

•  The  first  solution  is  step-by-step  (Solution  A). 

•  The  second  solution  uses  a  function  of  the  TI-83,  83+  or  84  calculators  (Solution  B). 
Solution  A 

Let  X  =  the  number  of  people  in  the  sample  who  have  cell  phones.  X  is  binomial.  X  ■~ 
B(500,ii). 
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To  calculate  the  confidence  interval,  you  must  find  p' ,  q' ,  and  EBP. 


n 


500 


X  =  the  number  of  successes  —  421 


P' 


i  -      -  0.842 


p'  —  0.842  is  the  sample  proportion;  this  is  the  point  estimate  of  the  population  proportion. 

(?'  =  l-p'  =  1  -0.842  =  0.158 

Since  CL     0.95,  then  a  =  l-CL=l  -  0.95  =  0.05       |  =  0.025. 


Use  the  Tl-83, 83+  or  84+  calculator  command  invNorm(0.975,0,l)  to  find  z.o25-  Remember  that  the 
area  to  the  right  of  z.025  is  0.025  and  the  area  to  the  left  of  zo.025  is  0.975.  This  can  also  be  found 
using  appropriate  commands  on  other  calculators,  using  a  computer,  or  using  a  Standard  Normal 
probability  table. 


p'  -  EBP  =  0.842  -  0.032  =  0.81 
p'  +  EBP  =  0.842  +  0.032  =  0.874 

The  confidence  interval  for  the  true  binomial  population  proportion  is 
{p'  -  EBP,  p'  +  EBP)  =(0.810, 0.874). 

Interpretation 

We  estimate  with  95%  confidence  that  between  81%  and  87.4%  of  all  adult  residents  of  this  city 
have  cell  phones. 

Explanation  of  95%  Confidence  Level 

95%  of  the  confidence  intervals  constructed  in  this  way  would  contain  the  true  value  for  the 
population  proportion  of  all  adult  residents  of  this  city  who  have  cell  phones. 

Solution  B 

Using  a  function  of  the  TI-83, 83+  or  84  calculators: 

Press  STAT  and  arrow  over  to  TESTS. 
Arrow  down  to  A :  l-PropZint.  Press  ENTER. 
Arrow  down  to  x  and  enter  421. 
Arrow  down  to  n  and  enter  500. 
Arrow  down  to  C-Level  and  enter  .95. 
Arrow  down  to  Calculate  and  press  ENTER. 
The  confidence  interval  is  (0.81003,  0.87397). 


Example  8.9 

For  a  class  project,  a  political  science  student  at  a  large  university  wants  to  estimate  the  percent 
of  students  that  are  registered  voters.  He  surveys  500  students  and  finds  that  300  are  registered 
voters.  Compute  a  90%  confidence  interval  for  the  true  percent  of  students  that  are  registered 
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voters  and  interpret  the  confidence  interval. 


Solution 

•  You  can  use  technology  to  directly  calculate  the  confidence  interval. 

•  The  first  solution  is  step-by-step  (Solution  A). 

•  The  second  solution  uses  a  function  of  the  TI-83,  83+  or  84  calciilators  (Solution  B). 

Solution  A 

X  =  300  and  n  =  500. 

1  =  1  =  0-600 

=  1  -  p'  =  1  -  0.600  =  0.400 
Since  CL  =  0.90,  then  a  =  l-CL  =  l-  0.90  =  0.10       f  =  0.05. 

Z|  =  Z.05  —  1.645 

Use  the  Tl-83,  83+  or  84+  calculator  command  invNorm(0.95,0,l)  to  find  z.05.  Remember  that 
the  area  to  the  right  of  z.05  is  0.05  and  the  area  to  the  left  of  z  05  is  0.95.  This  can  also  be  found 
using  appropriate  commands  on  other  calculators,  using  a  computer,  or  using  a  Standard  Normal 
probability  table. 

EBP  =  z»  ■  y      =  1.645  •  ^J^^^  =  0.036 
p'  -  EBP  =  0.60  -  0.036  =  0.564 
p'  +  EBP  =  0.60  +  0.036  =  0.636 

The  confidence  interval  for  the  true  binomial  population  proportion  is 
(p'  -  EBP,  p'  +  EBP)  =(0.564, 0.636). 

Interpretation: 

•  We  estimate  with  90%  confidence  that  the  true  percent  of  all  students  that  are  registered 
voters  is  between  56.4%  and  63.6%. 

•  Alternate  Wording:  We  estimate  with  90%  confidence  that  between  56.4%  and  63.6%  of  ALL 

students  are  registered  voters. 

Explanation  of  90%  Confidence  Level 

90%  of  all  confidence  intervals  constructed  in  this  way  contain  the  true  value  for  the  population 
percent  of  students  that  are  registered  voters. 

Solution  B 

Using  a  function  of  the  TI-83,  83+  or  84  calculators: 

Press  STAT  and  arrow  over  to  TESTS. 
Arrow  down  to  A :  1-PropZint.  Press  ENTER. 
Arrow  down  to  x  and  enter  300. 
Arrow  down  to  n  and  enter  500. 
Arrow  down  to  C-Level  and  enter  .90. 
Arrow  down  to  Calculate  and  press  ENTER. 
The  confidence  interval  is  (0.564,  0.636). 
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8.4.1  Calculating  the  Sample  Size  n 

If  researchers  desire  a  specific  margin  of  error,  then  they  can  use  the  error  bound  formiila  to  calculate  the 
required  sample  size. 

The  error  bound  formula  for  a  population  proportion  is 


•  Solving  for  n  gives  you  an  equation  for  the  sample  size. 


Example  8.10 

Suppose  a  mobile  phone  company  wants  to  determine  the  current  percentage  of  customers 
aged  50+  that  use  text  messaging  on  their  cell  phone.  How  many  customers  aged  50+  should 
the  company  survey  in  order  to  be  90%  confident  that  the  estimated  (sample)  proportion  is 
within  3  percentage  points  of  the  true  popiilation  proportion  of  customers  aged  50+  that  use  text 
messaging  on  their  cell  phone. 


Solution 

From  the  problem,  we  know  that  EBP=0.03  (3%=0.03)  and 
Z|  =  Z.05  —  1.645  because  the  confidence  level  is  90% 

However,  in  order  to  find  n  ,  we  need  to  know  the  estimated  (sample)  proportion  p'.  Remember 
that  q'=l-p'.  But,  we  do  not  know  p'  yet.  Since  we  multiply  p'  and  q'  together,  we  make  them  both 
equal  to  0.5  because  p'q'=  (.5)(.5)=.25  results  in  the  largest  possible  product.  (Try  other  products: 
(.6)(.4)=.24;  (.3)(.7)=.21;  (.2)(.8)=.16  and  so  on).  The  largest  possible  product  gives  us  the  largest  n. 
This  gives  us  a  large  enough  sample  so  that  we  can  be  90%  confident  that  we  are  within  3  percent- 
age points  of  the  true  population  proportion.  To  calculate  the  sample  size  n,  use  the  formula  and 
make  the  substitutions. 

n  =  g^  gives  n  =  1:^^^=751.7 

Roimd  the  answer  to  the  next  higher  value.  The  sample  size  should  be  752  cell  phone  customers 
aged  50+  in  order  to  be  90%  confident  that  the  estimated  (sample)  proportion  is  within  3  percent- 
age points  of  the  true  population  proportion  of  all  customers  aged  50+  that  use  text  messaging  on 
their  cell  phone. 

**With  contributions  from  Roberta  Bloom. 
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8.5  Summary  of  Formulas^ 

Formula  8.1:  General  form  of  a  confidence  interval 

(lower  value,  upper  value)  —  (point  estimate  —  error  bound,  point  estimate  +  error  bound) 

Formula  8.2:  To  find  the  error  bound  when  you  know  the  confidence  interval 

error  bound  —  upper  value  —  point  estimate       OR       error  bound  —  "PP^''  vaiue^  lower  value 

Formula  8.3:  Single  Population  Mean,  Known  Standard  Deviation,  Normal  Distribution 
Use  the  Normal  Distribution  for  Means  (Section  7.2)       EBM  =  z «  •  ^ 

The  confidence  interval  has  the  format  (x  —  EBM,  x  +  EBM). 

Formula  8.4:  Single  Population  Mean,  Unknown  Standard  Deviation,  Student's-t  Distribution 
Use  the  Student's-t  Distribution  with  degrees  of  freedom  df  =  n  —  1.  EBM  =  ^  |  ■ 

Formula  8.5:  Single  Population  Proportion,  Normal  Distribution 

Use  the  Normal  Distribution  for  a  single  popiilation  proportion  p'  —  ^ 

EBP  =  z».y£;^       V'  +  q'^l 

The  confidence  interval  has  the  format  (p'  —  EBP,  p'  +  EBP). 

Formula  8.6:  Point  Estimates 

X  is  a  point  estimate  for  ji 
p'  is  a  point  estimate  for  p 

s  is  a  point  estimate  for  a 


^This  content  is  available  onUne  at  <http://cnx.org/content/ml6973/1.8/>. 
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8.6  Practice  1:  Confidence  Intervals  for  Means,  Known  Population  Stan- 
dard Deviation^ 

8.6.1  Student  Learning  Outcomes 

•  The  student  will  calculate  confidence  intervals  for  means  when  the  popiilation  standard  deviation  is 
known. 

8.6.2  Given 

The  mean  age  for  all  Foothill  CoUege  students  for  a  recent  Fall  term  was  33.2.  The  population  standard  de- 
viation has  been  pretty  consistent  at  15.  Suppose  that  twenty-five  Winter  students  were  randomly  selected. 
The  mean  age  for  the  sample  was  30.4.  We  are  interested  in  the  true  mean  age  for  Winter  Foothill  College 
students.  (http://research.fhda.edu/factbook/FH_Demo_Trends/FoothillDemographicTrends.htm'' 

Let  X  —  the  age  of  a  Winter  Foothill  CoUege  student 

8.6.3  Calculating  the  Confidence  Interval 

Exercise  8.6.1 

X  — 

Exercise  8.6.2 

n= 

Exercise  8.6.3 
15=(insert  symbol  here) 

Exercise  8.6.4 

Define  the  Random  Variable,  X,  in  words 

X  = 

Exercise  8.6.5  (Solution  on  p.  371.) 

What  is  X  estimating? 

Exercise  8.6.6  (Solution  on  p.  371.) 

Is  cTx  known? 

Exercise  8.6.7  (Solution  on  p.  371.) 

As  a  result  of  your  answer  to  (4),  state  the  exact  distribution  to  use  when  calculating  the  Confi- 
dence Interval. 


(Solution  on  p.  371.) 
(Solution  on  p.  371.) 
(Solution  on  p.  371.) 
(Solution  on  p.  371.) 


8.6.4  Explaining  the  Confidence  Interval 

Construct  a  95%  Confidence  Interval  for  the  true  mean  age  of  Winter  Foothill  College  students. 

Exercise  8.6.8  (Solution  on  p.  371.) 

How  much  area  is  in  both  taUs  (combined)?  a  —  


Exercise  8.6.9  (Solution  on  p.  371.) 

How  much  area  is  in  each  tail?  j  —  


Exercise  8.6.10  (Solution  on  p.  371.) 

Identify  the  following  specifications: 


''This  content  is  available  online  at  <http://cnx.Org/content/ml6970/l.13/>. 
^http://research.fhda.edu/factbook/FH_Demo_Trends/FoothillDemographicTrends.htm 
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a.  lower  limit  = 

b.  upper  limit  = 

c.  error  bound  = 

Exercise  8.6.11  (Solution  on  p.  371.) 

The  95%  Confidence  Interval  is:  

Exercise  8.6.12 

Fill  in  the  blanks  on  the  graph  with  the  areas,  upper  and  lower  limits  of  the  Confidence  Interval,  and 

the  sample  mean. 


X 


Figure  8.2 


Exercise  8.6.13 

In  one  complete  sentence,  explain  what  the  interval  means. 


8.6.5  Discussion  Questions 
Exercise  8.6.14 

Using  the  same  mean,  standard  deviation  and  level  of  confidence,  suppose  that  n  were  69  instead 
of  25.  Would  the  error  bound  become  larger  or  smaller?  How  do  you  know? 

Exercise  8.6.15 

Using  the  same  mean,  standard  deviation  and  sample  size,  how  would  the  error  bound  change  if 
the  confidence  level  were  reduced  to  90%?  Why? 
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8.7  Practice  2:  Confidence  Intervals  for  Means,  Unknown  Population 
Standard  Deviation* 

8.7.1  Student  Learning  Outcomes 

•  The  student  will  calculate  confidence  intervals  for  means  when  the  popiilation  standard  deviation  is 
unknown. 


8.7.2  Given 

The  following  real  data  are  the  result  of  a  random  survey  of  39  national  flags  (with  replacement  between 
picks)  from  various  countries.  We  are  interested  in  finding  a  confidence  interval  for  the  true  mean  number 
of  colors  on  a  national  flag.  Let  X  —  the  number  of  colors  on  a  national  flag. 


X 

Freq. 

1 

1 

2 

7 

3 

18 

4 

7 

5 

6 

Table  8.1 


8.7.3  Calculating  the  Confidence  Interval 

Exercise  8.7.1  (Solution  on  p.  371.) 

Calculate  the  following: 

A.  X  — 

b.  Sx  = 

c.  n  — 

Exercise  8.7.2  (Solution  on  p.  371.) 

Define  the  Random  Variable,  X,  in  words.  X  —  

Exercise  8.7.3  (Solution  on  p.  371.) 

What  is  X  estimating? 

Exercise  8.7.4  (Solution  on  p.  371.) 

Is  cTx  known? 

Exercise  8.7.5  (Solution  on  p.  371.) 

As  a  result  of  your  answer  to  (4),  state  the  exact  distribution  to  use  when  calculating  the  Confi- 
dence Interval. 


'This  content  is  available  online  at  <http://caTx.org/content/ml6971/1.14/>. 
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8.7.4  Confidence  Interval  for  the  True  Mean  Number 

Construct  a  95%  Confidence  Interval  for  the  true  mean  number  of  colors  on  national  flags. 

Exercise  8.7.6  (Solution  on  p.  371.) 

How  much  area  is  in  both  tails  (combined)?  a.  = 

Exercise  8.7.7  (Solution  on  p.  371.) 


How  much  area  is  in  each  tail?  | 

Exercise  8.7.8 

Calculate  the  following: 

a.  lower  limit  = 

b.  upper  limit  = 

c.  error  bound  = 


(Solution  on  p.  371.) 


Exercise  8.7.9  (Solution  on  p.  372.) 

The  95%  Confidence  Interval  is: 

Exercise  8.7.10 

Fill  in  the  blanks  on  the  graph  with  the  areas,  upper  and  lower  limits  of  the  Confidence  Interval 
and  the  sample  mean. 


a 


X 


Figure  8.3 


Exercise  8.7.11 

In  one  complete  sentence,  explain  what  the  interval  means. 


8.7.5  Discussion  Questions 
Exercise  8.7.12 

Using  the  same  x,  s^;,  and  level  of  confidence,  suppose  that  n  were  69  instead  of  39.  Would  the 
error  bound  become  larger  or  smaller?  How  do  you  know? 

Exercise  8.7.13 

Using  the  same  x,  s^,  and  n  =  39,  how  would  the  error  bound  change  if  the  confidence  level  were 
reduced  to  90%?  Why? 
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8.8  Practice  3:  Confidence  Intervals  for  Proportions' 

8.8.1  Student  Learning  Outcomes 

•  The  student  will  calculate  confidence  intervals  for  proportions. 

8.8.2  Given 

The  Ice  Chalet  offers  dozens  of  different  beginning  ice-skating  classes.  All  of  the  class  names  are  put  into  a 
bucket.  The  5  P.M.,  Monday  night,  ages  8  -  12,  beginning  ice-skating  class  was  picked.  In  that  class  were  64 
girls  and  16  boys.  Suppose  that  we  are  interested  in  the  true  proportion  of  girls,  ages  8  - 12,  in  all  beginning 
ice-skating  classes  at  the  Ice  Chalet.  Assiraie  that  the  children  in  the  selected  class  is  a  random  sample  of 
the  population. 

8.8.3  Estimated  Distribution 

Exercise  8.8.1 
What  is  being  coiinted? 

Exercise  8.8.2 

In  words,  define  the  Random  Variable  X.  X 

Exercise  8.8.3 

Calculate  the  following: 

A.  X  — 

h.  n  — 
c.  p'  — 

Exercise  8.8.4 

State  the  estimated  distribution  of  X.  X  ~ 
Exercise  8.8.5 

Define  a  new  Random  Variable  P'.  What  is  p'  estimating? 
Exercise  8.8.6 

In  words,  define  the  Random  Variable  P' .  P'  — 
Exercise  8.8.7 

State  the  estimated  distribution  of  P'.  P'  ~ 


8.8.4  Explaining  the  Confidence  Interval 

Construct  a  92%  Confidence  Interval  for  the  true  proportion  of  girls  in  the  age  8 
classes  at  the  Ice  Chalet. 

Exercise  8.8.8 

How  much  area  is  in  both  tails  (combined)?  a.  — 
Exercise  8.8.9 

How  much  area  is  in  each  tail?  |  — 

Exercise  8.8.10 

Calculate  the  following: 

'This  content  is  available  online  at  <http://cnx.org/content/ml6968/1.13/>. 
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a.  lower  limit  = 

b.  upper  limit  = 

c.  error  bound  = 

Exercise  8.8.11  (Solution  on  p.  372.) 

The  92%  Confidence  Interval  is: 

Exercise  8.8.12 

Fill  in  the  blanks  on  the  graph  with  the  areas,  upper  and  lower  limits  of  the  Confidence  Interval,  and 

the  sample  proportion. 

CE  a 

—  =   C.L.  =   —  =  


Figure  8.4 


Exercise  8.8.13 

In  one  complete  sentence,  explain  what  the  interval  means. 


8.8.5  Discussion  Questions 
Exercise  8.8.14 

Using  the  same  p'  and  level  of  confidence,  suppose  that  n  were  increased  to  100.  Would  the  error 
bound  become  larger  or  smaller?  How  do  you  know? 

Exercise  8.8.15 

Using  the  same  p'  and  n  =  80,  how  would  the  error  bound  change  if  the  confidence  level  were 
increased  to  98%?  Why? 

Exercise  8.8.16 

If  you  decreased  the  allowable  error  bound,  why  would  the  minimum  sample  size  increase  (keep- 
ing the  same  level  of  confidence)? 
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8.9  Homework  ° 

NOTE:  If  you  are  using  a  student 's-t  distribution  for  a  homework  problem  below,  you  may  assume 
that  the  underlying  population  is  normally  distributed.  (In  general,  you  must  first  prove  that 
assumption,  though.) 

Exercise  8.9.1  (Solution  on  p.  372.) 

Among  various  ethnic  groups,  the  standard  deviation  of  heights  is  known  to  be  approximately  3 
inches.  We  wish  to  construct  a  95%  confidence  interval  for  the  mean  height  of  male  Swedes.  48 
male  Swedes  are  surveyed.  The  sample  mean  is  71  inches.  The  sample  standard  deviation  is  2.8 
inches. 

a.  i.  X  —  

ii.  a  —  

iii.  Sx  =  

iv.  n  —  

v.  M  —  1  =  

b.  Define  the  Random  Variables  X  and  X,  in  words. 

c.  Which  distribution  shoiild  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  95%  confidence  interval  for  the  population  mean  height  of  male  Swedes. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  What  will  happen  to  the  level  of  confidence  obtained  if  1000  male  Swedes  are  surveyed  instead 

of  48?  Why? 

Exercise  8.9.2 

In  six  packages  of  "The  Flintstones®  Real  Fruit  Snacks"  there  were  5  Bam-Bam  snack  pieces.  The 
total  number  of  snack  pieces  in  the  six  bags  was  68.  We  wish  to  calculate  a  96%  confidence  interval 
for  the  population  proportion  of  Bam-Bam  snack  pieces. 

a.  Define  the  Random  Variables  X  and  P' ,  in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice 

c.  Calculate  p'. 

d.  Construct  a  96%  confidence  interval  for  the  population  proportion  of  Bam-Bam  snack  pieces 

per  bag. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  Do  you  think  that  six  packages  of  frmt  snacks  yield  enough  data  to  give  acciirate  results?  Why 

or  why  not? 

Exercise  8.9.3  (Solution  on  p.  372.) 

A  random  survey  of  enrollment  at  35  community  colleges  across  the  United  States  yielded  the 
following  figures  (source:  Microsoft  Bookshelf):  6414;  1550;  2109;  9350;  21828;  4300;  5944;  5722; 
2825;  2044;  5481;  5200;  5853;  2750;  10012;  6357;  27000;  9414;  7681;  3200;  17500;  9200;  7380;  18314; 
6557;  13713;  17768;  7493;  2771;  2861;  1263;  7285;  28165;  5080;  11622.  Assume  the  underlying 
popidation  is  normal. 

a.  i.  X  — 

^"^This  content  is  available  online  at  <http://c3:ix.org/content/ml6966/1.16/>. 
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ii.  Sx  =  

iii.  n  —  

iv.  n  —  1  =  

b.  Define  the  Random  Variables  X  and  X,  in  words. 

c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  95%  confidence  interval  for  the  population  mean  enrollment  at  community  colleges 

in  the  United  States. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  What  will  happen  to  the  error  bound  and  confidence  interval  if  500  commimity  colleges  were 

surveyed?  Why? 

Exercise  8.9.4 

From  a  stack  of  IEEE  Spectrum  magazines,  announcements  for  84  upcoming  engineering  confer- 
ences were  randomly  picked.  The  mean  length  of  the  conferences  was  3.94  days,  with  a  standard 
deviation  of  1.28  days.  Assume  the  underlying  population  is  normal. 

a.  Define  the  Random  Variables  X  and  X,  in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

c.  Construct  a  95%  confidence  interval  for  the  population  mean  length  of  engineering  conferences. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

Exercise  8.9.5  (Solution  on  p.  373.) 

Suppose  that  a  committee  is  studying  whether  or  not  there  is  waste  of  time  in  our  judicial  system. 

It  is  interested  in  the  mean  amount  of  time  individuals  waste  at  the  courthouse  waiting  to  be  called 
for  service.  The  committee  randomly  surveyed  81  people.  The  sample  mean  was  8  hours  with  a 
sample  standard  deviation  of  4  hours. 

a.  i.  X  —  

ii.  Sx  =  

iii.  n  =  

iv.  n  —  1  —  

b.  Define  the  Random  Variables  X  and  X,  in  words. 

c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  95%  confidence  interval  for  the  population  mean  time  wasted. 

a.  State  the  confidence  interval. 

b.  Sketch  the  graph. 

c.  Calculate  the  error  bound. 

e.  Explain  in  a  complete  sentence  what  the  confidence  interval  means. 
Exercise  8.9.6 

Suppose  that  an  accounting  firm  does  a  study  to  determine  the  time  needed  to  complete  one  per- 
son's tax  forms.  It  randomly  surveys  100  people.  The  sample  mean  is  23.6  hours.  There  is  a  known 
standard  deviation  of  7.0  hours.  The  population  distiibution  is  assumed  to  be  normal. 

a.  i.  ^  =  

ii.  a  =  

iii.  Sx  =  
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iv.  n  —  

V.  n  —  1  —  

b.  Define  the  Random  Variables  X  and  X,  in  words. 

c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  90%  confidence  interval  for  the  population  mean  time  to  complete  the  tax  forms. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  If  the  firm  wished  to  increase  its  level  of  confidence  and  keep  the  error  boimd  the  same  by 

taking  another  survey,  what  changes  should  it  make? 

f.  If  the  firm  did  another  survey,  kept  the  error  boimd  the  same,  and  only  surveyed  49  people, 

what  would  happen  to  the  level  of  confidence?  Why? 

g.  Suppose  that  the  firm  decided  that  it  needed  to  be  at  least  96%  confident  of  the  population 

mean  length  of  time  to  within  1  hoiir.  How  woiild  the  niimber  of  people  the  firm  siurveys 
change?  Why? 

Exercise  8.9.7  (Solution  on  p.  373.) 

A  sample  of  16  small  bags  of  the  same  brand  of  candies  was  selected.  Assume  that  the  population 
distribution  of  bag  weights  is  normal.  The  weight  of  each  bag  was  then  recorded.  The  mean 
weight  was  2  oimces  with  a  standard  deviation  of  0.12  ounces.  The  population  standard  deviation 
is  known  to  be  0.1  ounce. 

a.  i.  X  =  

ii.  cr  —  

iii-  Sx  =  

iv.  n  —  

V.  n  —  1  —  

b.  Define  the  Random  Variable  X,  in  words. 

c.  Define  the  Random  Variable  X,  in  words. 

d.  Which  distribution  should  you  use  for  this  problem?  Explain  youi  choice. 

e.  Construct  a  90%  confidence  interval  for  the  population  mean  weight  of  the  candies. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

f .  Construct  a  98%  confidence  interval  for  the  population  mean  weight  of  the  candies. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

g.  In  complete  sentences,  explain  why  the  confidence  interval  in  (f)  is  larger  than  the  confidence 

interval  in  (e). 

h.  In  complete  sentences,  give  an  interpretation  of  what  the  interval  in  (f)  means. 
Exercise  8.9.8 

A  pharmaceutical  company  makes  tranquilizers.  It  is  assumed  that  the  distribution  for  the  length 
of  time  they  last  is  approximately  normal.  Researchers  in  a  hospital  used  the  drug  on  a  random 
sample  of  9  patients.  The  effective  period  of  the  tranquilizer  for  each  patient  (in  hours)  was  as 
foUows:  2.7;  2.8;  3.0;  2.3;  2.3;  2.2;  2.8;  2.1;  and  2.4  . 

a.  i.  X  —  

ii.  Sx  =  
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iii.  n  —  

iv.  n  —  1  —  

b.  Define  the  Random  Variable  X,  in  words. 

c.  Define  the  Random  Variable  X,  in  words. 

d.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

e.  Construct  a  95%  confidence  interval  for  the  population  mean  length  of  time. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

f .  What  does  it  mean  to  be  "95%  confident"  in  this  problem? 

Exercise  8.9.9  (Solution  on  p.  373.) 

Suppose  that  14  children  were  surveyed  to  determine  how  long  they  had  to  use  training  wheels. 
It  was  revealed  that  they  used  them  an  average  of  6  months  with  a  sample  standard  deviation  of 
3  months.  Assume  that  the  underlying  population  distribution  is  normal. 

a.  i.  X  =  

ii.  Sx  =  

iii.  n  =  

iv.  n  —  1  —  

b.  Define  the  Random  Variable  X,  in  words. 

c.  Define  the  Random  Variable  X,  in  words. 

d.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

e.  Construct  a  99%  confidence  interval  for  the  population  mean  length  of  time  using  training 

wheels. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

f .  Why  would  the  error  bound  change  if  the  confidence  level  was  lowered  to  90%? 
Exercise  8.9.10 

Insurance  companies  are  interested  in  knowing  the  population  percent  of  drivers  who  always 
buckle  up  before  riding  in  a  car. 

a.  When  designing  a  study  to  determine  this  population  proportion,  what  is  the  minimum  num- 

ber you  would  need  to  survey  to  be  95%  confident  that  the  population  proportion  is  esti- 
mated to  within  0.03? 

b.  If  it  was  later  determined  that  it  was  important  to  be  more  than  95%  confident  and  a  new  survey 

was  commissioned,  how  would  that  affect  the  minimum  number  you  would  need  to  survey? 
Why? 

Exercise  8.9.11  (Solution  on  p.  373.) 

Suppose  that  the  insurance  companies  did  do  a  survey.  They  randomly  surveyed  400  drivers  and 
found  that  320  claimed  to  always  buckle  up.  We  are  interested  in  the  population  proportion  of 
drivers  who  claim  to  always  buckle  up. 

a.  i.  X  =  

ii.  n  =  

iii.  p'  =  

b.  Define  the  Random  Variables  X  and  P',  in  words. 

c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 
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d.  Construct  a  95%  confidence  interval  for  the  population  proportion  that  claim  to  always  buckle 

up. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  If  this  survey  were  done  by  telephone,  list  3  difficulties  the  companies  might  have  in  obtaining 

random  results. 

Exercise  8.9.12 

Unoccupied  seats  on  flights  cause  airlines  to  lose  revenue.  Suppose  a  large  airline  wants  to  esti- 
mate its  mean  number  of  unoccupied  seats  per  flight  over  the  past  year.  To  accomplish  this,  the 
records  of  225  flights  are  randomly  selected  and  the  number  of  imoccupied  seats  is  noted  for  each 
of  the  sampled  flights.  The  sample  mean  is  11.6  seats  and  the  sample  standard  deviation  is  4.1 
seats. 

a.  i.  X  —  

ii.  Sx  =  

iii.  n  —  

iv.  n  —  1  =  

b.  Define  the  Random  Variables  X  and  X,  in  words. 

c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  92%  confidence  interval  for  the  population  mean  number  of  unoccupied  seats  per 

flight. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

Exercise  8.9.13  (Solution  on  p.  373.) 

According  to  a  recent  survey  of  1200  people,  61%  feel  that  the  president  is  doing  an  acceptable 
job.  We  are  interested  in  the  population  proportion  of  people  who  feel  the  president  is  doing  an 
acceptable  job. 

a.  Define  the  Random  Variables  X  and  P',  in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

c.  Construct  a  90%  confidence  interval  for  the  population  proportion  of  people  who  feel  the  pres- 

ident is  doing  an  acceptable  job. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

Exercise  8.9.14 

A  survey  of  the  mean  amount  of  cents  off  that  coupons  give  was  done  by  randomly  surveying  one 
coupon  per  page  from  the  coupon  sections  of  a  recent  San  Jose  Merciuy  News.  The  following  data 
were  collected:  20r,  75r,  50(f;  65(f;  30(f;  55t;  iOf,  40(£;  30(£;  55(£;  $1.50;  40(£;  65(£;  40(£.  Assiraie  the 
underlying  distribution  is  approximately  normal. 

a.  i.  X  =  

ii-  Sx  =  

iii.  n  =  

iv.  n  —  1  =  

b.  Define  the  Random  Variables  X  and  X,  in  words. 
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c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  95%  confidence  interval  for  the  population  mean  worth  of  coupons. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  If  many  random  samples  were  taken  of  size  14,  what  percent  of  the  confident  intervals  con- 

structed should  contain  the  population  mean  worth  of  coupons?  Explain  why. 

Exercise  8.9.15  (Solution  on  p.  374.) 

An  article  regarding  interracial  dating  and  marriage  recently  appeared  in  the  Washington  Post.  Of 
the  1709  randomly  selected  adults,  315  identified  themselves  as  Latinos,  323  identified  themselves 
as  blacks,  254  identified  themselves  as  Asians,  and  779  identified  themselves  as  whites.  In  this 
survey,  86%  of  blacks  said  that  their  families  would  welcome  a  white  person  into  their  families. 
Among  Asians,  77%  would  welcome  a  white  person  into  their  families,  71%  would  welcome  a 
Latino,  and  66%  would  welcome  a  black  person. 

a.  We  are  interested  in  finding  the  95%  confidence  interval  for  the  percent  of  all  black  families  that 

would  welcome  a  white  person  into  their  families.  Define  the  Random  Variables  X  and  P', 
in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

c.  Construct  a  95%  confidence  interval 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

Exercise  8.9.16 

Refer  to  the  problem  above. 

a.  Construct  three  95%  confidence  intervals. 

i:  Percent  of  all  Asians  that  would  welcome  a  white  person  into  their  families. 

ii:  Percent  of  all  Asians  that  would  welcome  a  Latino  into  their  families. 

iii:  Percent  of  all  Asians  that  would  welcome  a  black  person  into  their  families. 

b.  Even  though  the  three  point  estimates  are  different,  do  any  of  the  confidence  intervals  overlap? 

Which? 

c.  For  any  intervals  that  do  overlap,  in  words,  what  does  this  imply  about  the  significance  of  the 

differences  in  the  true  proportions? 

d.  For  any  intervals  that  do  not  overlap,  in  words,  what  does  this  imply  about  the  significance  of 

the  differences  in  the  true  proportions? 

Exercise  8.9.17  (Solution  on  p.  374.) 

A  camp  director  is  interested  in  the  mean  number  of  letters  each  child  sends  during  his/her  camp 
session.  The  population  standard  deviation  is  known  to  be  2.5.  A  survey  of  20  campers  is  taken. 
The  mean  from  the  sample  is  7.9  with  a  sample  standard  deviation  of  2.8. 

a.  i.  X  —  

ii.  (T  —  

iii.  Sx  =  

iv.  n  =  

V.  n  —  1  —  

b.  Define  the  Random  Variables  X  and  X,  in  words. 
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c.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

d.  Construct  a  90%  confidence  interval  for  the  population  mean  number  of  letters  campers  send 

home. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

e.  What  wiU  happen  to  the  error  bound  and  confidence  interval  if  500  campers  are  surveyed? 

Why? 

Exercise  8.9.18 

Stanford  University  conducted  a  study  of  whether  running  is  healthy  for  men  and  women  over 
age  50.  During  the  first  eight  years  of  the  study,  1.5%  of  the  451  members  of  the  50-Plus  Fitness 
Association  died.  We  are  interested  in  the  proportion  of  people  over  50  who  ran  and  died  in  the 
same  eight-year  period. 

a.  Define  the  Random  Variables  X  and  P',  in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

c.  Construct  a  97%  confidence  interval  for  the  population  proportion  of  people  over  50  who  ran 

and  died  in  the  same  eight-year  period. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

d.  Explain  what  a  "97%  confidence  interval"  means  for  this  study. 

Exercise  8.9.19  (Solution  on  p.  374.) 

In  a  recent  sample  of  84  used  cars  sales  costs,  the  sample  mean  was  $6425  with  a  standard  deviation 
of  $3156.  Assume  the  underlying  distribution  is  approximately  normal. 

a.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

b.  Define  the  Random  Variable  X,  in  words. 

c.  Construct  a  95%  confidence  interval  for  the  population  mean  cost  of  a  used  car. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

d.  Explain  what  a  "95%  confidence  interval"  means  for  this  study. 
Exercise  8.9.20 

A  telephone  poll  of  1000  adult  Americans  was  reported  in  an  issue  of  Time  Magazine.  One  of  the 

questions  asked  was  "What  is  the  main  problem  facing  the  country?"  20%  answered  "crime".  We 
are  interested  in  the  population  proportion  of  adult  Americans  who  feel  that  crime  is  the  main 
problem. 

a.  Define  the  Random  Variables  X  and  P',  in  words. 

b.  Which  distribution  should  you  use  for  this  problem?  Explain  your  choice. 

c.  Construct  a  95%  confidence  interval  for  the  population  proportion  of  adult  Americans  who  feel 

that  crime  is  the  main  problem. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

d.  Suppose  we  want  to  lower  the  sampling  error.  What  is  one  way  to  accomplish  that? 
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e.  The  sampling  error  given  by  Yankelovich  Partners,  Inc.  (which  conducted  the  poll)  is  ±  3%.  In 
1-3  complete  sentences,  explain  what  the  ±  3%  represents. 

Exercise  8.9.21  (Solution  on  p.  374.) 

Refer  to  the  above  problem.  Another  question  in  the  poll  was  "[How  much  are]  you  worried 
about  the  quality  of  education  in  our  schools?"  63%  responded  "a  lot".  We  are  interested  in  the 
population  proportion  of  adiilt  Americans  who  are  worried  a  lot  about  the  quality  of  education  in 
our  schools. 

1.  Define  the  Random  Variables  X  and  P',  in  words. 

2.  Which  distribution  should  you  use  for  this  problem?  Explain  youj  choice. 

3.  Construct  a  95%  confidence  interval  for  the  population  proportion  of  adult  Americans  wor- 
ried a  lot  about  the  quality  of  education  in  our  schools. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

4.  The  sampling  error  given  by  Yankelovich  Partners,  Inc.  (which  conducted  the  poll)  is  ±  3%. 
In  1-3  complete  sentences,  explain  what  the  ±  3%  represents. 

Exercise  8.9.22 

Six  different  national  brands  of  chocolate  chip  cookies  were  randomly  selected  at  the  supermarket. 
The  grams  of  fat  per  serving  are  as  follows:  8;  8;  10;  7;  9;  9.  Assume  the  underlying  distribution  is 
approximately  normal. 

a.  Calculate  a  90%  confidence  interval  for  the  population  mean  grams  of  fat  per  serving  of  choco- 

late chip  cookies  sold  in  supermarkets. 

i.  State  the  confidence  interval. 

ii.  Sketch  the  graph. 

iii.  Calculate  the  error  bound. 

b.  If  you  wanted  a  smaller  error  bound  while  keeping  the  same  level  of  confidence,  what  should 

have  been  changed  in  the  study  before  it  was  done? 

c.  Go  to  the  store  and  record  the  grams  of  fat  per  serving  of  six  brands  of  chocolate  chip  cookies. 

d.  Calculate  the  mean. 

e.  Is  the  mean  within  the  interval  you  calculated  in  part  (a)?  Did  you  expect  it  to  be?  Why  or  why 

not? 

Exercise  8.9.23 

A  confidence  interval  for  a  proportion  is  given  to  be  (-  0.22,  0.34).  Why  doesn't  the  lower  limit  of 
the  confidence  interval  make  practical  sense?  How  should  it  be  changed?  Why? 


8.9.1  Try  these  multiple  choice  questions. 

The  next  three  problems  refer  to  the  following:  According  to  a  Field  Poll,  79%  of  California  adults 
(actual  results  are  400  out  of  506  surveyed)  feel  that  "education  and  our  schools"  is  one  of  the  top  is- 
sues facing  California.  We  wish  to  construct  a  90%  confidence  interval  for  the  true  proportion  of  Cali- 
fornia adults  who  feel  that  education  and  the  schools  is  one  of  the  top  issues  facing  California.  (Source: 
http:/ / field.com/fieldpollonline/subscribers/) 

Exercise  8.9.24  (Solution  on  p.  374.) 

A  point  estimate  for  the  true  population  proportion  is: 
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A.  0.90 

B.  1.27 

C.  0.79 

D.  400 


Exercise  8.9.25 

A  90%  confidence  interval  for  the  population  proportion  is: 


(Solution  on  p.  374.) 


A.  (0.761,0.820) 

B.  (0.125,0.188) 

C.  (0.755,0.826) 

D.  (0.130,0.183) 


Exercise  8.9.26 


(Solution  on  p.  374.) 


The  error  bound  is  approximately 

A.  1.581 

B.  0.791 

C.  0.059 

D.  0.030 

The  next  two  problems  refer  to  the  following: 

A  quality  control  specialist  for  a  restaurant  chain  takes  a  random  sample  of  size  12  to  check  the  amount  of 
soda  served  in  the  16  oz.  serving  size.  The  sample  mean  is  13.30  with  a  sample  standard  deviation  of  1.55. 
Assume  the  underlying  popiilation  is  normally  distributed. 

Exercise  8.9.27  (Solution  on  p.  374.) 

Find  the  95%  Confidence  Interval  for  the  true  population  mean  for  the  amount  of  soda  served. 

A.  (12.42,14.18) 

B.  (12.32,14.29) 

C.  (12.50,14.10) 

D.  Impossible  to  determine 

Exercise  8.9.28  (Solution  on  p.  374.) 

What  is  the  error  bound? 


What  is  meant  by  the  term  "90%  confident"  when  constructing  a  confidence  interval  for  a  mean? 

A.  If  we  took  repeated  samples,  approximately  90%  of  the  samples  woiild  produce  the  same  con- 

fidence interval. 

B.  If  we  took  repeated  samples,  approximately  90%  of  the  confidence  intervals  calculated  from 

those  samples  would  contain  the  sample  mean. 

C.  If  we  took  repeated  samples,  approximately  90%  of  the  confidence  intervals  calculated  from 

those  samples  would  contain  the  true  value  of  the  popiilation  mean. 

D.  If  we  took  repeated  samples,  the  sample  mean  woiild  equal  the  population  mean  in  approxi- 

mately 90%  of  the  samples. 


A.  0.87 

B.  1.98 

C.  0.99 

D.  1.74 


Exercise  8.9.29 


(Solution  on  p.  374.) 
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The  next  two  problems  refer  to  the  following: 

Five  hundred  and  eleven  (511)  homes  in  a  certain  southern  California  community  are  randomly  surveyed 
to  determine  if  they  meet  minimal  earthquake  preparedness  recommendations.  One  hundred  seventy-three 
(173)  of  the  homes  surveyed  met  the  minimum  recommendations  for  earthquake  preparedness  and  338  did 
not. 

Exercise  8.9.30  (Solution  on  p.  374.) 

Find  the  Confidence  Interval  at  the  90%  Confidence  Level  for  the  true  popiilation  proportion  of 
southern  California  commimity  homes  meeting  at  least  the  minimirai  recommendations  for  earth- 
quake preparedness. 

A.  (0.2975,0.3796) 

B.  (0.6270,6959) 

C.  (0.3041,0.3730) 

D.  (0.6204,0.7025) 


Exercise  8.9.31  (Solution  on  p.  374.) 

The  point  estimate  for  the  population  proportion  of  homes  that  do  not  meet  the  minimirai  recom- 
mendations for  earthquake  preparedness  is: 

A.  0.6614 

B.  0.3386 

C.  173 

D.  338 
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The  next  three  problems  refer  to  the  following  situation:  Suppose  that  a  sample  of  15  randomly  chosen 
people  were  put  on  a  special  weight  loss  diet.  The  amount  of  weight  lost,  in  pounds,  follows  an  unknown 
distribution  with  mean  equal  to  12  poimds  and  standard  deviation  equal  to  3  poimds.  Assume  that  the 
distribution  for  the  weight  loss  is  normal. 


To  find  the  probability  that  the  mean  amount  of  weight  lost  by  15  people  is  no  more  than  14 
pounds,  the  random  variable  should  be: 

A.  The  number  of  people  who  lost  weight  on  the  special  weight  loss  diet 

B.  The  number  of  people  who  were  on  the  diet 

C.  The  mean  amount  of  weight  lost  by  15  people  on  the  special  weight  loss  diet 

D.  The  total  amovmt  of  weight  lost  by  15  people  on  the  special  weight  loss  diet 

Exercise  8.10.2  (Solution  on  p.  375.) 

Find  the  probability  asked  for  in  the  previous  problem. 

Exercise  8.10.3  (Solution  on  p.  375.) 

Find  the  90th  percentile  for  the  mean  amount  of  weight  lost  by  15  people. 

The  next  three  questions  refer  to  the  following  situation:  The  time  of  occurrence  of  the  first  accident 
during  rush-hour  traffic  at  a  major  intersection  is  uniformly  distributed  between  the  three  hour  interval  4 
p.m.  to  7  p.m.  Let  X  =  the  amoimt  of  time  (hours)  it  takes  for  the  first  accident  to  occur. 

•  So,  if  an  accident  occurs  at  4  p.m.,  the  amount  of  time,  in  hours,  it  took  for  the  accident  to  occur  is 


Exercise  8.10.4  (Solution  on  p.  375.) 

What  is  the  probability  that  the  time  of  occurrence  is  within  the  first  half-hour  or  the  last  hour  of 
the  period  from  4  to  7  p.m.? 

A.  Cannot  be  determined  from  the  information  given 


The  20th  percentile  occurs  after  how  many  hours? 

A.  0.20 

B.  0.60 

C.  0.50 

D.  1 

Exercise  8.10.6  (Solution  on  p.  375.) 

Assume  Ramon  has  kept  track  of  the  times  for  the  first  accidents  to  occur  for  40  different  days.  Let 
C  =  the  total  cumulative  time.  Then  C  follows  which  distribution? 


Exercise  8.10.1 


(Solution  on  p.  375.) 


Exercise  8.10.5 


(Solution  on  p.  375.) 


A.  LI(0,3) 
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B.  Exp  (1) 

C.  N  (60,5.477) 

D.  N  (1.5, 0.01875) 

Exercise  8.10.7  (Solution  on  p.  375.) 

Using  the  information  in  question  #6,  find  the  probability  that  the  total  time  for  all  first  accidents 
to  occur  is  more  than  43  hours. 

The  next  two  questions  refer  to  the  following  situation:  The  length  of  time  a  parent  must  wait  for  his 
children  to  clean  their  rooms  is  uniformly  distributed  in  the  time  interval  from  1  to  15  days. 

Exercise  8.10.8  (Solution  on  p.  375.) 

How  long  must  a  parent  expect  to  wait  for  his  children  to  clean  their  rooms? 

A.  8  days 

B.  3  days 

C.  14  days 

D.  6  days 


Exercise  8.10.9  (Solution  on  p.  375.) 

What  is  the  probability  that  a  parent  wiU  wait  more  than  6  days  given  that  the  parent  has  already 
waited  more  than  3  days? 

A.  0.5174 

B.  0.0174 

C.  0.7500 

D.  0.2143 


The  next  five  problems  refer  to  the  following  study:  Twenty  percent  of  the  students  at  a  local  community 
college  live  in  within  five  miles  of  the  campus.  Thirty  percent  of  the  students  at  the  same  commimity  college 
receive  some  kind  of  financial  aid.  Of  those  who  live  within  five  miles  of  the  campus,  75%  receive  some 
kind  of  financial  aid. 

Exercise  8.10.10  (Solution  on  p.  375.) 

Find  the  probability  that  a  randomly  chosen  student  at  the  local  community  college  does  not  live 
within  five  miles  of  the  campus. 

A.  80% 

B.  20% 

C.  30% 

D.  Cannot  be  determined 


Exercise  8.10.11  (Solution  on  p.  375.) 

Find  the  probability  that  a  randomly  chosen  student  at  the  local  community  college  lives  within 
five  miles  of  the  campus  or  receives  some  kind  of  financial  aid. 

A.  50% 

B.  35% 

C.  27.5% 

D.  75% 


Exercise  8.10.12  (Solution  on  p.  375.) 

Based  upon  the  above  information,  are  living  in  student  housing  within  five  miles  of  the  campus 
and  receiving  some  kind  of  financial  aid  mutually  exclusive? 
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A.  Yes 

B.  No 

C.  Cannot  be  determined 


Exercise  8.10.13 

The  interest  rate  charged  on  the  financial  aid  is 


data. 


(Solution  on  p.  375.) 


A.  quantitative  discrete 

B.  quantitative  continuous 

C.  qualitative  discrete 

D.  qualitative 


Exercise  8.10.14 


(Solution  on  p.  375.) 


What  follows  is  information  about  the  students  who  receive  financial  aid  at  the  local  commuiuty 
college. 

•  1st  quartile  =  $250 

•  2nd  quartile  =  $700 

•  3rd  quartile  =  $1200 

(These  amounts  are  for  the  school  year.)  If  a  sample  of  200  students  is  taken,  how  many  are 
expected  to  receive  $250  or  more? 

A.  50 

B.  250 

C.  150 

D.  Cannot  be  determined 

The  next  two  problems  refer  to  the  following  information:  P  (A)  =  0.2  ,  P  (B)  —  0.3  ,  A  and  B  are 
independent  events. 

Exercise  8.10.15  (Solution  on  p.  375.) 

P(A  ANDB)  = 


P  (A  OR  B)  = 

A.  0.56 

B.  0.5 

C.  0.44 

D.  1 

Exercise  8.10.17  (Solution  on  p.  375.) 

If  H  and  D  are  mutually  exclusive  events,  P  (H)  =  0.25  ,  P  (D)  =  0.15 ,  then  P  (H|D) 


A.  0.5 

B.  0.6 

C.  0 

D.  0.06 


Exercise 


8.10.16 


(Solution  on  p.  375.) 


A.  1 

B.  0 

C.  0.40 

D.  0.0375 
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8.11  Lab  1:  Confidence  Interval  (Home  Costs)^^ 

Class  Time: 
Names: 

8.11.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  the  90%  confidence  interval  for  the  mean  cost  of  a  home  in  the  area  in  which 
this  school  is  located. 

•  The  student  will  interpret  confidence  intervals. 

•  The  student  wiU  determine  the  effects  that  changing  conditions  has  on  the  confidence  interval. 


8.11.2  Collect  the  Data 

Check  the  Real  Estate  section  in  your  local  newspaper.  (Note:  many  papers  only  list  them  one  day  per 
week.  Also,  we  will  assume  that  homes  come  up  for  sale  randomly.)  Record  the  sales  prices  for  35  randomly 
selected  homes  recently  listed  in  the  county. 

1.  Complete  the  table: 


Table  8.2 


8.11.3  Describe  the  Data 

1.  Compute  the  following: 

a.  X  = 
h.  Sx  = 
c.  n  = 

2.  Define  the  Random  Variable  X,  in  words.  X  = 

3.  State  the  estimated  distribution  to  use.  Use  both  words  and  symbols. 
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8.11.4  Find  the  Confidence  Interval 

1.  Calculate  the  confidence  interval  and  the  error  bound. 

a.  Confidence  Interval: 

b.  Error  Bound: 

2.  How  much  area  is  in  both  tails  (combined)?  a  = 

3.  How  much  area  is  in  each  tail?  j  = 

4.  Fill  in  the  blanks  on  the  graph  with  the  area  in  each  section.  Then,  fill  in  the  number  line  with  the 
upper  and  lower  limits  of  the  confidence  interval  and  the  sample  mean. 


X 


Figure  8.5 


5.  Some  students  think  that  a  90%  confidence  interval  contains  90%  of  the  data.  Use  the  list  of  data  on 
the  first  page  and  count  how  many  of  the  data  values  lie  within  the  confidence  interval.  What  percent 
is  this?  Is  this  percent  close  to  90%?  Explain  why  this  percent  should  or  should  not  be  close  to  90%. 


8.11.5  Describe  tlie  Confidence  Interval 

1.  In  two  to  three  complete  sentences,  explain  what  a  Confidence  Interval  means  (in  general),  as  if  you 
were  talking  to  someone  who  has  not  taken  statistics. 

2.  In  one  to  two  complete  sentences,  explain  what  this  Confidence  Interval  means  for  this  particular 
study. 


8.11.6  Use  the  Data  to  Construct  Confidence  Intervals 

1.  Using  the  above  information,    construct  a  confidence  interval  for  each  confidence  level 
given. 


Confidence  level 

EBM  /  Error  Bound 

Confidence  Interval 

50% 

80% 

95% 

99% 

Table  8.3 
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What  happens  to  the  EBM  as  the  confidence  level  increases?  Does  the  width  of  the  confidence  interval 
increase  or  decrease?  Explain  why  this  happens. 
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8.12  Lab  2:  Confidence  Interval  (Place  of  Birth) 


13 


Class  Time: 


Names: 


8.12.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  the  90%  confidence  interval  for  proportion  of  students  in  this  school  that 
were  born  in  this  state. 

•  The  student  will  interpret  confidence  intervals. 

•  The  student  will  determine  the  effects  that  changing  conditions  have  on  the  confidence  interval. 


8.12.2  Collect  the  Data 

1.  Survey  the  students  in  your  class,  asking  them  if  they  were  born  in  this  state.  Let  X  =  the  number  that 
were  born  in  this  state. 


b.  X  =  

2.  Define  the  Random  Variable  P'  in  words. 

3.  State  the  estimated  distribution  to  use. 


8.12.3  Find  the  Confidence  Interval  and  Error  Bound 

1.  Calculate  the  confidence  interval  and  the  error  bound. 

a.  Confidence  Interval: 

b.  Error  Bound: 

2.  How  much  area  is  in  both  tails  (combined)?  a- 

3.  How  much  area  is  in  each  tail?  |  = 

4.  Fill  in  the  blanks  on  the  graph  with  the  area  in  each  section.  Then,  fill  in  the  number  line  with  the 
upper  and  lower  limits  of  the  confidence  interval  and  the  sample  proportion. 


a.  n  = 


a 


C.L.= 


a 


2 


2 


P 


Figure  8.6 
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8.12.4  Describe  the  Confidence  Interval 

1.  In  two  to  three  complete  sentences,  explain  what  a  Confidence  Interval  means  (in  general),  as  if  you 
were  talking  to  someone  who  has  not  taken  statistics. 

2.  In  one  to  two  complete  sentences,  explain  what  this  Confidence  Interval  means  for  this  particular 
study. 

3.  Using  the  above  information,  construct  a  confidence  interval  for  each  given  confidence  level 
given. 


Confidence  level 

EBP  /  Error  Bound 

Confidence  Interval 

50% 

80% 

95% 

99% 

Table  8.4 


4.  What  happens  to  the  EBP  as  the  confidence  level  increases?  Does  the  width  of  the  confidence  interval 
increase  or  decrease?  Explain  why  this  happens. 
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8.13  Lab  3:  Confidence  Interval  (Womens'  Heights)'' 

Class  Time: 
Names: 

8.13.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  a  90%  confidence  interval  using  the  given  data. 

•  The  student  will  determine  the  relationship  between  the  confidence  level  and  the  percent  of  con- 
structed intervals  that  contain  the  population  mean. 

8.13.2  Given: 

1.  Heights  of  100  Women  (in  Inches) 


59.4 

71.6 

69.3 

65.0 

62.9 

66.5 

61.7 

55.2 

67.5 

67.2 

63.8 

62.9 

63.0 

63.9 

68.7 

65.5 

61.9 

69.6 

58.7 

63.4 

61.8 

60.6 

69.8 

60.0 

64.9 

66.1 

66.8 

60.6 

65.6 

63.8 

61.3 

59.2 

64.1 

59.3 

64.9 

62.4 

63.5 

60.9 

63.3 

66.3 

61.5 

64.3 

62.9 

60.6 

63.8 

58.8 

64.9 

65.7 

62.5 

70.9 

62.9 

63.1 

62.2 

58.7 

64.7 

66.0 

60.5 

64.7 

65.4 

60.2 

65.0 

64.1 

61.1 

65.3 

64.6 

59.2 

61.4 

62.0 

63.5 

61.4 

65.5 

62.3 

65.5 

64.7 

58.8 

66.1 

64.9 

66.9 

57.9 

69.8 

58.5 

63.4 

69.2 

65.9 

62.2 

60.0 

58.1 

62.5 

62.4 

59.1 

66.4 

61.2 

60.4 

58.7 

66.7 

67.5 

63.2 

56.6 

67.7 

62.5 

Table  8.5 


Listed  above  are  the  heights  of  100  women.  Use  a  random  niraiber  generator  to  randomly  select  10 
data  values. 

This  content  is  available  online  at  <http: / /cnx.org/content/ml6964/1.12/>. 
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2.  Calculate  the  sample  mean  and  sample  standard  deviation.  Assume  that  the  population  standard 
deviation  is  known  to  be  3.3  inches.  With  these  values,  construct  a  90%  confidence  interval  for  your 
sample  of  10  values.  Write  the  confidence  interval  you  obtained  in  the  first  space  of  the  table  below. 

3.  Now  write  your  confidence  interval  on  the  board.  As  others  in  the  class  write  their  confidence  inter- 
vals on  the  board,  copy  them  into  the  table  below: 

90%  Confidence  Intervals 


Table  8.6 


8.13.3  Discussion  Questions 

1.  The  actual  population  mean  for  the  100  heights  given  above  is  }i  =  63.4.  Using  the  class  listing  of 
confidence  intervals,  coimt  how  many  of  them  contain  the  population  mean  ^;  i.e.,  for  how  many 
intervals  does  the  value  of    lie  between  the  endpoints  of  the  confidence  interval? 

2.  Divide  this  number  by  the  total  number  of  confidence  intervals  generated  by  the  class  to  determine 
the  percent  of  confidence  intervals  that  contains  the  mean  fi.  Write  this  percent  below. 

3.  Is  the  percent  of  confidence  intervals  that  contain  the  popiilation  mean  ji  close  to  90%? 

4.  Suppose  we  had  generated  100  confidence  intervals.  What  do  you  think  would  happen  to  the  percent 
of  confidence  intervals  that  contained  the  popiilation  mean? 

5.  When  we  construct  a  90%  confidence  interval,  we  say  that  we  are  90%  confident  that  the  true  popu- 
lation mean  lies  within  the  confidence  interval.  Using  complete  sentences,  explain  what  we  mean 
by  this  phrase. 

6.  Some  students  think  that  a  90%  confidence  interval  contains  90%  of  the  data.  Use  the  list  of  data  given 
(the  heights  of  women)  and  count  how  many  of  the  data  values  lie  within  the  confidence  interval  that 
you  generated  on  that  page.  How  many  of  the  100  data  values  lie  within  your  confidence  interval? 
What  percent  is  this?  Is  this  percent  close  to  90%? 

7.  Explain  why  it  does  not  make  sense  to  count  data  values  that  lie  in  a  confidence  interval.  Think  about 
the  random  variable  that  is  being  used  in  the  problem. 

8.  Suppose  you  obtained  the  heights  of  10  women  and  calculated  a  confidence  interval  from  this  infor- 
mation. Without  knowing  the  population  mean  }i,  would  you  have  any  way  of  knowing  for  certain 
if  your  interval  actually  contained  the  value  of  ]4?  Explain. 


NOTE:  This  lab  was  designed  and  contributed  by  Diane  Mathios. 
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Solutions  to  Exercises  in  Chapter  8 

Solutions  to  Practice  1:  Confidence  Intervals  for  Means,  Known  Population  Standard 
Deviation 

Solution  to  Exercise  8.6.1  (p.  345) 
30.4 

Solution  to  Exercise  8.6.2  (p.  345) 

25 

Solution  to  Exercise  8.6.3  (p.  345) 

cr 

Solution  to  Exercise  8.6.4  (p.  345) 

the  mean  age  of  25  randomly  selected  Winter  Foothill  students 
Solution  to  Exercise  8.6.5  (p.  345) 

F 

Solution  to  Exercise  8.6.6  (p.  345) 

yes 

Solution  to  Exercise  8.6.7  (p.  345) 

Normal 

Solution  to  Exercise  8.6.8  (p.  345) 

0.05 

Solution  to  Exercise  8.6.9  (p.  345) 
0.025 

Solution  to  Exercise  8.6.10  (p.  345) 

a.  24.52 

b.  36.28 

c.  5.88 

Solution  to  Exercise  8.6.11  (p.  346) 

(24.52,36.28) 

Solutions  to  Practice  2:  Confidence  Intervals  for  Means,  Unknown  Population  Stan- 
dard Deviation 

Solution  to  Exercise  8.7.1  (p.  347) 

a.  3.26 

b.  1.02 

c.  39 

Solution  to  Exercise  8.7.2  (p.  347) 

the  mean  number  of  colors  of  39  flags 


Solution 

to 

Exercise 

8.7.3 

(P- 

347) 

F 

Solution 

to 

Exercise 

8.7.4 

(P- 

347) 

No 

Solution 

to 

Exercise 

8.7.5 

(P- 

347) 

^38 

Solution 

to 

Exercise 

8.7.6 

(P- 

348) 

0.05 

Solution 

to 

Exercise 

8.7.7 

(P- 

348) 

0.025 

Solution 

to 

Exercise 

8.7.8 

(P- 

348) 
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a.  2.93 

b.  3.59 

c.  0.33 

Solution  to  Exercise  8.7.9  (p.  348) 

2.93;  3.59 

Solutions  to  Practice  3:  Confidence  Intervals  for  Proportions 

Solution  to  Exercise  8.8.2  (p.  349) 

The  number  of  girls,  age  8-12,  in  the  beginning  ice  skating  class 
Solution  to  Exercise  8.8.3  (p.  349) 

a.  64 

b.  80 

c.  0.8 

Solution  to  Exercise  8.8.4  (p.  349) 

5(80,0.80) 

Solution  to  Exercise  8.8.5  (p.  349) 

P 

Solution  to  Exercise  8.8.6  (p.  349) 

The  proportion  of  girls,  age  8-12,  in  the  beginning  ice  skating  class. 
Solution  to  Exercise  8.8.8  (p.  349) 

1  -  0.92  =  0.08 

Solution  to  Exercise  8.8.9  (p.  349) 
0.04 

Solution  to  Exercise  8.8.10  (p.  349) 

a.  0.72 

b.  0.88 

c.  0.08 

Solution  to  Exercise  8.8.11  (p.  350) 
(0.72;  0.88) 

Solutions  to  Homework 
Solution  to  Exercise  8.9.1  (p.  351) 

a.  i.  71 

ii.  3 

iii.  2.8 

iv.  48 
V.  47 


d.  i.  CI:  (70.15,71.85) 
iii.  EB  =  0.85 

Solution  to  Exercise  8.9.3  (p.  351) 

a.  i.  8629 

ii.  6944 

iii.  35 
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iv.  34 


C.  f34 

d.  i.  CI:  (6244, 11,014) 

iii.  EB  =  2385 

e.  It  will  become  smaller 

Solution  to  Exercise  8.9.5  (p.  352) 

a.  i.  8 

ii.  4 

iii.  81 

iv.  80 

c.  fso 

d.  i.  CI:  (7.12, 8.88) 

iii.  EB  =  0.88 

Solution  to  Exercise  8.9.7  (p.  353) 

a.  i.  2 

ii.  0.1 

iii.  0.12 

iv.  16 
V.  15 

b.  the  weight  of  1  small  bag  of  candies 

c.  the  mean  weight  of  16  small  bags  of  candies 


e.  i.  CI:  (1.96,  2.04) 

iii.  EB  =  0.04 

f.  i.  CI:  (1.94, 2.06) 

iii.  EB  =  0.06 

Solution  to  Exercise  8.9.9  (p.  354) 

a.  i.  6 

ii.  3 

iii.  14 

iv.  13 

b.  the  time  for  a  child  to  remove  his  training  wheels 

c.  the  mean  time  for  14  children  to  remove  iheir  training  wheels. 

d.  ti3 

e.  i.  CI:  (3.58,  8.42) 

iii.  EB  =  2.42 

Solution  to  Exercise  8.9.11  (p.  354) 

a.  i.  320 

ii .  400 

iii.  0.80 


d.  i.  CI:  (0.76, 0.84) 
iii.  EB  =  0.04 


Solution  to  Exercise  8.9.13  (p.  355) 
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b.  N(0.61,V^°^) 

c.  i.  CI:  (0.59, 0.63) 

iii.  EB  =  0.02 

Solution  to  Exercise  8.9.15  (p.  356) 

c.  i.  CI:  (0.823,  0.898) 

iii.  EB  =  0.038 

Solution  to  Exercise  8.9.17  (p.  356) 

a.  i.  7.9 

ii.  2.5 

iii.  2.8 

iv.  20 
V.  19 

d.  i.  CI:  (6.98, 8.82) 

iii.  EB:  0.92 

Solution  to  Exercise  8.9.19  (p.  357) 

a-  ^83 

b.  mean  cost  of  84  used  cars 

c.  i.  CI:  (5740.10,  7109.90) 

iii.  EB  =  684.90 

Solution  to  Exercise  8.9.21  (p.  358) 

b.  N(0.63,yi^) 

c.  i.  CI:  (0.60, 0.66) 

iii.  EB  =  0.03 

Solution  to  Exercise  8.9.24  (p.  358) 

C 

Solution  to  Exercise  8.9.25  (p.  359) 
A 

Solution  to  Exercise  8.9.26  (p.  359) 

D 

Solution  to  Exercise  8.9.27  (p.  359) 

B 

Solution  to  Exercise  8.9.28  (p.  359) 
C 

Solution  to  Exercise  8.9.29  (p.  359) 

C 

Solution  to  Exercise  8.9.30  (p.  360) 

C 

Solution  to  Exercise  8.9.31  (p.  360) 

A 
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Solutions  to  Review 

Solution  to  Exercise  8.10.1 

C 

Solution  to  Exercise  8.10.2 

0.9951 

Solution  to  Exercise  8.10.3 

12.99 

Solution  to  Exercise  8.10.4 
C 

Solution  to  Exercise  8.10.5 
B 

Solution  to  Exercise  8.10.6 

C 

Solution  to  Exercise  8.10.7 
0.9990 

Solution  to  Exercise  8.10.8 

A 

Solution  to  Exercise  8.10.9 

C 

Solution  to  Exercise  8.10.10  (p.  362) 

A 

Solution  to  Exercise  8.10.11  (p.  362) 
B 

Solution  to  Exercise  8.10.12  (p.  362) 

B 

Solution  to  Exercise  8.10.13  (p.  363) 
B 

Solution  to  Exercise  8.10.14  (p.  363) 
C.  150 

Solution  to  Exercise  8.10.15  (p.  363) 

D 

Solution  to  Exercise  8.10.16  (p.  363) 
C 

Solution  to  Exercise  8.10.17  (p.  363) 

B 


(p.  361) 
(p.  361) 
(p.  361) 
(p.  361) 
(p.  361) 
(p.  361) 
(p.  362) 
(p.  362) 
(p.  362) 
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Chapter  9 


Hypothesis  Testing:  Single  Mean  and 
Single  Proportion 

9.1  Hypothesis  Testing:  Single  Mean  and  Single  Proportion^ 

9.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Differentiate  between  Type  I  and  Type  II  Errors 

•  Describe  hypothesis  testing  in  general  and  in  practice 

•  Conduct  and  interpret  hypothesis  tests  for  a  single  population  mean,  population  standard  deviation 
known. 

•  Conduct  and  interpret  hypothesis  tests  for  a  single  popiilation  mean,  population  standard  deviation 
ujnknown. 

•  Conduct  and  interpret  hypothesis  tests  for  a  single  population  proportion. 


9.1.2  Introduction 

One  job  of  a  statistician  is  to  make  statistical  inferences  about  populations  based  on  samples  taken  from  the 
population.  Confidence  intervals  are  one  way  to  estimate  a  population  parameter  Another  way  to  make 
a  statistical  inference  is  to  make  a  decision  about  a  parameter.  For  instance,  a  car  dealer  advertises  that 
its  new  small  truck  gets  35  miles  per  gallon,  on  the  average.  A  tutoring  service  claims  that  its  method  of 
tutoring  helps  90%  of  its  students  get  an  A  or  a  B.  A  company  says  that  women  managers  in  their  company 
earn  an  average  of  $60,000  per  year. 

A  statistician  wiU  make  a  decision  about  these  claims.  This  process  is  called  "hypothesis  testing."  A  hy- 
pothesis test  involves  collecting  data  from  a  sample  an.d  evaluating  the  data.  Then,  the  statistician  makes  a 
decision  as  to  whether  or  not  there  is  sufficient  evidence  based  upon  analyses  of  the  data,  to  reject  the  nuU 
hypothesis. 

In  this  chapter,  you  will  conduct  hypothesis  tests  on  single  means  and  single  proportions.  You  will  also 
learn  about  the  errors  associated  with  these  tests. 

Hypothesis  testing  consists  of  two  contradictory  hypotheses  or  statements,  a  decision  based  on  the  data, 
and  a  conclusion.  To  perform  a  h5rpothesis  test,  a  statistician  will: 

'This  content  is  available  online  at  <http://cnx.Org/content/ml6997/l. ll/>. 
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1.  Set  up  two  contradictory  hypotheses. 

2.  Collect  sample  data  (in  homework  problems,  the  data  or  summary  statistics  will  be  given  to  you). 

3.  Determine  the  correct  distribution  to  perform  the  hjrpothesis  test. 

4.  Analyze  sample  data  by  performing  the  calculations  that  ultimately  will  allow  you  to  reject  or  fail  to 
reject  the  null  hypothesis. 

5.  Make  a  decision  and  write  a  meaningful  conclusion. 

NOTE:  To  do  the  h5^othesis  test  homework  problems  for  this  chapter  and  later  chapters,  make 
copies  of  the  appropriate  special  solution  sheets.  See  the  Table  of  Contents  topic  "Solution  Sheets". 


9.2  Null  and  Alternate  Hypotheses^ 

The  actual  test  begins  by  considering  two  hypotheses.  They  are  called  the  null  hypothesis  and  the  alternate 
hypothesis.  These  hypotheses  contain  opposing  viewpoints. 

Ho'-  The  null  hypothesis:  It  is  a  statement  about  the  popidation  that  wUl  be  assumed  to  be  true  ujnless  it 
can  be  shown  to  be  incorrect  beyond  a  reasonable  doubt. 

Ha-  The  alternate  hypothesis:  It  is  a  claim  about  the  population  that  is  contradictory  to  Ho  and  what  we 
conclude  when  we  reject  Hq. 

Example  9.1 

Hg-.  No  more  than  30%  of  the  registered  voters  in  Santa  Clara  Coimty  voted  in  the  primary  election. 

Ha'.  More  than  30%  of  the  registered  voters  in  Santa  Clara  Coimty  voted  in  the  primary  election. 
Example  9.2 

We  want  to  test  whether  the  mean  grade  point  average  in  American  colleges  is  different  from  2.0 
(out  of  4.0). 

Ho:  }i  =  2.0       Ha:  n  +  2.0 
Example  9.3 

We  want  to  test  if  college  students  take  less  than  five  years  to  graduate  from  college,  on  the  aver- 
age. 

H:     >5       Ha-  }i  <5 
Example  9.4 

In  an  issue  of  U.  S.  News  and  World  Report,  an  article  on  school  standards  stated  that  about  half 

of  all  students  in  France,  Germany,  and  Israel  take  advanced  placement  exams  and  a  third  pass. 
The  same  article  stated  that  6.6%  of  U.  S.  students  take  advanced  placement  exams  and  4.4  %  pass. 
Test  if  the  percentage  of  U.  S.  students  who  take  advanced  placement  exams  is  more  than  6.6%. 

Ho:  p=  0.066       Ha.  p  >  0.066 

Since  the  nuU  and  alternate  hypotheses  are  contradictory,  you  must  examine  evidence  to  decide  if  you  have 
enough  evidence  to  reject  the  null  hj^othesis  or  not.  The  evidence  is  in  the  form  of  sample  data. 

After  you  have  determined  which  h5^othesis  the  sample  supports,  you  make  a  decision.  There  are  two 
options  for  a  decision.  They  are  "reject  H"  if  the  sample  information  favors  the  alternate  hypothesis  or  "do 
not  reject  H"  or  "fail  to  reject  H"  if  the  sample  information  is  insufficient  to  reject  the  null  hypothesis. 

^This  content  is  available  online  at  <http://cnx.org/content/ml6998/1.14/>. 
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Mathematical  Symbols  Used  in  Hg  and  Ha. 


Ho 

Ha 

equal  {—) 

not  equal  (7^)  or  greater  than  (>  )  or  less  than  (<) 

greater  than  or  equal  to  (>) 

less  than  (<) 

less  than  or  equal  to  (<) 

more  than  (>  ) 

Table  9.1 


NOTE:  Ho  always  has  a  symbol  with  an  equal  in  it.  Hg  never  has  a  symbol  with  an  equal  in  it.  The 
choice  of  symbol  depends  on  the  wording  of  the  hypothesis  test.  However,  be  aware  that  many 
researchers  (including  one  of  the  co-authors  in  research  work)  use  =  in  the  Niill  Hypothesis,  even 
with  >  or  <  as  the  symbol  in  the  Alternate  H5^othesis.  This  practice  is  acceptable  because  we 
only  make  the  decision  to  reject  or  not  reject  the  NuU  Hypothesis. 

9.2.1  Optional  Collaborative  Classroom  Activity 

Bring  to  class  a  newspaper,  some  news  magazines,  and  some  Internet  articles  .  In  groups,  find  articles  from 
which  your  group  can  write  a  null  and  alternate  h5^otheses.  Discuss  your  h5^otheses  with  the  rest  of  the 
class. 

9.3  Outcomes  and  the  Type  I  and  Type  II  Errors^ 

When  you  perform  a  hypothesis  test,  there  are  four  possible  outcomes  depending  on  the  actual  truth  (or 
falseness)  of  the  nuU  hypothesis  Ho  and  the  decision  to  reject  or  not.  The  outcomes  are  summarized  in  the 
following  table: 


ACTION 

Ho  IS  ACTUALLY 

True 

False 

Do  not  reject  Ho 

Correct  Outcome 

Type  11  error 

Reject  Ho 

Type  1  Error 

Correct  Outcome 

Table  9.2 


The  fouj  possible  outcomes  in  the  table  are: 

•  The  decision  is  to  not  reject  Ho  when,  in  fact.  Ho  is  true  (correct  decision). 

•  The  decision  is  to  reject  Hg  when,  in  fact.  Ho  is  true  (incorrect  decision  known  as  a  Type  I  error). 

•  The  decision  is  to  not  reject  Ho  when,  in  fact.  Ho  is  false  (incorrect  decision  known  as  a  Type  II  error). 

•  The  decision  is  to  reject  Hg  when,  in  fact,  Hg  is  false  (correct  decision  whose  probability  is  called  the 
Power  of  the  Test). 

Each  of  the  errors  occurs  with  a  particular  probability.  The  Greek  letters  a.  and  /5  represent  the  probabilities. 

a  =  probability  of  a  Type  I  error  =  P(Type  I  error)  =  probability  of  rejecting  the  niill  hypothesis  when  the 
null  h5rpothesis  is  true. 

^This  content  is  available  online  at  <http://cnx.org/content/ml7006/1.8/>. 
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j6  =  probability  of  a  Type  II  error  =  P(Type  II  error)  =  probability  of  not  rejecting  the  null  hypothesis  when 
the  null  h5^othesis  is  false. 

a  and    should  be  as  small  as  possible  because  they  are  probabilities  of  errors.  They  are  rarely  0. 

The  Power  of  the  Test  is  1  —  ^.  Ideally,  we  want  a  high  power  that  is  as  close  to  1  as  possible.  Increasing  the 
sample  size  can  increase  the  Power  of  the  Test. 

The  following  are  examples  of  Type  I  and  Type  II  errors. 
Example  9.5 

Suppose  the  null  hj^othesis,  Hg,  is:  Frank's  rock  climbing  equipment  is  safe. 

Type  I  error:  Frank  thinks  that  his  rock  climbing  equipment  may  not  be  safe  when,  in  fact,  it  really 
is  safe.  Type  II  error:  Frank  thinks  that  his  rock  climbing  eqmpment  may  be  safe  when,  in  fact,  it 
is  not  safe. 

a  =  probability  that  Frank  thinks  his  rock  climbing  equipment  may  not  be  safe  when,  in  fact,  it 
really  is  safe.  /5  =  probability  that  Frank  thinks  his  rock  climbing  equipment  may  be  safe  when,  in 
fact,  it  is  not  safe. 

Notice  that,  in  this  case,  the  error  with  the  greater  consequence  is  the  Type  11  error.  (If  Frank  thinks 
his  rock  climbing  equipment  is  safe,  he  will  go  ahead  and  use  it.) 

Example  9.6 

Suppose  the  null  h5^othesis,  Hg,  is:  The  victim  of  an  automobile  accident  is  alive  when  he  arrives 
at  the  emergency  room  of  a  hospital. 

Type  I  error:  The  emergency  crew  thinks  that  the  victim  is  dead  when,  in  fact,  the  victim  is  alive. 
Type  II  error:  The  emergency  crew  does  not  know  if  the  victim  is  alive  when,  in  fact,  the  victim  is 
dead. 

a  =  probability  that  the  emergency  crew  thinks  the  victim  is  dead  when,  in  fact,  he  is  really  alive 
=  P(Type  I  error).  /3  =  probability  that  the  emergency  crew  does  not  know  if  the  victim  is  alive 
when,  in  fact,  the  victim  is  dead  =  P(Type  II  error). 

The  error  with  the  greater  consequence  is  the  Type  I  error.  (If  the  emergency  crew  thinks  the  victim 
is  dead,  they  will  not  treat  him.) 


9.4  Distribution  Needed  for  Hypothesis  Testing^ 

Earlier  in  the  course,  we  discussed  sampling  distributions.  Particular  distributions  are  associated  with 
hypothesis  testing.  Perform  tests  of  a  population  mean  using  a  normal  distribution  or  a  studenf  s-t  dis- 
tribution. (Remember,  use  a  student's-t  distribution  when  the  population  standard  deviation  is  imknown 

and  the  distribution  of  the  sample  mean  is  approximately  normal.)  In  this  chapter  we  perform  tests  of  a 
population  proportion  using  a  normal  distribution  (usually  n  is  large  or  the  sample  size  is  large). 

If  you  are  testing  a  single  population  mean,  the  distribution  for  the  test  is  for  means: 

X~n(^x,^)        or  tdi 


*This  content  is  available  online  at  <http://caTx.org/content/ml7017/1.13/>. 
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The  population  parameter  is  }i.  The  estimated  value  (point  estimate)  for  ji  is  x,  the  sample  mean. 

If  you  are  testing  a  single  population  proportion,  the  distribution  for  the  test  is  for  proportions  or  percent- 
ages: 

The  population  parameter  is  p.  The  estimated  value  (point  estimate)  for  p  is  p'.  p'  —  ^  where  x  is  the 
number  of  successes  and  n  is  the  sample  size. 

9.5  Assumption^ 

When  you  perform  a  hypothesis  test  of  a  single  population  mean  }i  using  a  Student's-t  distribution  (often 

called  a  t-test),  there  are  fundamental  assumptions  that  need  to  be  met  in  order  for  the  test  to  work  prop- 
erly. Your  data  should  be  a  simple  random  sample  that  comes  from  a  population  that  is  approximately 
normally  distributed.  You  use  the  sample  standard  deviation  to  approximate  the  population  standard 
deviation.  (Note  that  if  the  sample  size  is  sufficiently  large,  a  t-test  will  work  even  if  the  population  is  not 
approximately  normally  distributed). 

When  you  perform  a  hypothesis  test  of  a  single  population  mean  }i  using  a  normal  distribution  (often 
called  a  z-test),  you  take  a  simple  random  sample  from  the  population.  The  population  you  are  testing 
is  normally  distributed  or  your  sample  size  is  sufficiently  large.  You  know  the  value  of  the  population 
standard  deviation. 

When  you  perform  a  hypothesis  test  of  a  single  population  proportion  p,  you  take  a  simple  random 
sample  from  the  population.  You  must  meet  the  conditions  for  a  binomial  distribution  which  are  there  are 
a  certain  number  n  of  independent  trials,  the  outcomes  of  any  trial  are  success  or  failure,  and  each  trial  has 
the  same  probability  of  a  success  p.  The  shape  of  the  binomial  distribution  needs  to  be  similar  to  the  shape 
of  the  normal  distribution.  To  ensure  this,  the  quantities  np  and  nq  must  both  be  greater  than  five  (np  >  5 
and  nq  >  5).  Then  the  binomial  distribution  of  sample  (estimated)  proportion  can  be  approximated  by  the 

normal  distribution  with  }i  —  p  and  cr  —  Remember  that  q  —  1  —  p. 

9.6  Rare  Events' 

Suppose  you  make  an  assumption  about  a  property  of  the  population  (this  assumption  is  the  null  hypoth- 
esis). Then  you  gather  sample  data  randomly.  If  the  sample  has  properties  that  would  be  very  unlikely 
to  occur  if  the  assumption  is  true,  then  you  would  conclude  that  your  assumption  about  the  population  is 
probably  incorrect.  (Remember  that  yoiur  assumption  is  just  an  assumption  -  it  is  not  a  fact  and  it  may  or 
may  not  be  true.  But  your  sample  data  are  real  and  the  data  are  showing  you  a  fact  that  seems  to  contradict 
your  assumption.) 

For  example,  Didi  and  Ali  are  at  a  birthday  party  of  a  very  wealthy  friend.  They  hurry  to  be  first  in  line 
to  grab  a  prize  from  a  tall  basket  that  they  cannot  see  inside  because  they  will  be  blindfolded.  There  are 
200  plastic  bubbles  in  the  basket  and  Didi  and  Ali  have  been  told  that  there  is  only  one  with  a  $100  bill. 
Didi  is  the  first  person  to  reach  into  the  basket  and  pull  out  a  bubble.  Her  bubble  contains  a  $100  bill.  The 
probability  of  this  happening  is  ^  —  0.005.  Because  this  is  so  unlikely,  Ali  is  hoping  that  what  the  two 
of  them  were  told  is  wrong  and  there  are  more  $100  bills  in  the  basket.  A  "rare  event"  has  occurred  (Didi 
getting  the  $100  bill)  so  Ali  doubts  the  assumption  about  only  one  $100  bill  being  in  the  basket. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml7002/l.16/>. 
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9.7  Using  the  Sample  to  Test  the  Null  Hypothesis^ 

Use  the  sample  data  to  calculate  the  actual  probability  of  getting  the  test  result,  called  the  p-value.  The 
p-value  is  the  probability  that,  if  the  null  hypothesis  is  true,  the  results  from  another  randomly  selected 
sample  will  be  as  extreme  or  more  extreme  as  the  results  obtained  from  the  given  sample. 

A  large  p-value  calculated  from  the  data  indicates  that  we  should  fail  to  reject  the  null  hypothesis.  The 
smaller  the  p-value,  the  more  unlikely  the  outcome,  and  the  stronger  the  evidence  is  against  the  null  hy- 
pothesis. We  would  reject  the  null  hjrpothesis  if  the  evidence  is  strongly  against  it. 

Draw  a  graph  that  shows  the  p-value.  The  hypothesis  test  is  easier  to  perform  if  you  use  a  graph  because 
you  see  the  problem  more  clearly. 

Example  9.7:  (to  illustrate  the  p-value) 

Suppose  a  baker  claims  that  his  bread  height  is  more  than  15  cm,  on  the  average.  Several  of  his 
customers  do  not  believe  him.  To  persuade  his  customers  that  he  is  right,  the  baker  decides  to  do  a 
hypothesis  test.  He  bakes  10  loaves  of  bread.  The  mean  height  of  the  sample  loaves  is  17  cm.  The 
baker  knows  from  baking  hundreds  of  loaves  of  bread  that  the  standard  deviation  for  the  height 
is  0.5  cm.  and  the  distribution  of  heights  is  normal. 

The  null  h5rpothesis  could  be  Hg:  }i  <  15  The  alternate  hypothesis  is  H^:  ^  >  15 

The  words  "is  more  than"  translates  as  a  ">  "  so  "pi  >  15"  goes  into  the  alternate  hypothesis.  The 
null  h5^othesis  must  contradict  the  alternate  h5^othesis. 

Since  cr  is  known  {a  =  0.5  cm.),  the  distribution  for  the  population  is  known  to  be  normal  with 
mean  u=  15  and  standard  deviation      =        =  0.16. 

Suppose  the  null  hypothesis  is  true  (the  mean  height  of  the  loaves  is  no  more  than  15  cm).  Then 
is  the  mean  height  (17  cm)  calculated  from  the  sample  unexpectedly  large?  The  hypothesis  test 
works  by  asking  the  question  how  unlikely  the  sample  mean  would  be  if  the  null  hypothesis 
were  true.  The  graph  shows  how  far  out  the  sample  mean  is  on  the  normal  curve.  The  p-value  is 
the  probability  that,  if  we  were  to  take  other  samples,  any  other  sample  mean  would  fall  at  least 
as  far  out  as  17  cm. 

The  p-value,  then,  is  the  probability  that  a  sample  mean  is  the  same  or  greater  than  17  cm. 
when  the  population  mean  is,  in  fact,  15  cm.  We  can  calculate  this  probability  using  the  normal 
distribution  for  means  from  Chapter  7. 

p-value  is 
approximately  0 


15  17 

p-value  =  P  (x  >  17)  which  is  approximately  0. 
^This  content  is  available  online  at  <http://cnx.Org/content/ml6995/l.17/>. 
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A  p-value  of  approximately  0  tells  us  that  it  is  highly  iinlikely  that  a  loaf  of  bread  rises  no  more 
than  15  cm,  on  the  average.  That  is,  almost  0%  of  all  loaves  of  bread  would  be  at  least  as  high 
as  17  cm.  purely  by  CHANCE  had  the  population  mean  height  really  been  15  cm.  Because  the 
outcome  of  17  cm.  is  so  unlikely  (meaning  it  is  happening  NOT  by  chance  alone),  we  conclude 
that  the  evidence  is  strongly  against  the  null  hypothesis  (the  mean,  height  is  at  most  15  cm.).  There 
is  sufficient  evidence  that  the  true  mean  height  for  the  population  of  the  baker's  loaves  of  bread  is 
greater  than  15  cm. 


9.8  Decision  and  Conclusion^ 

A  systematic  way  to  make  a  decision  of  whether  to  reject  or  not  reject  the  null  hypothesis  is  to  compare  the 
p-value  and  a  preset  or  preconceived  a.  (also  called  a  "significance  level").  A  preset  a  is  the  probability  of 
a  Type  I  error  (rejecting  the  null  h3^othesis  when  the  nuU  hypothesis  is  true).  It  may  or  may  not  be  given 
to  you  at  the  beginning  of  the  problem. 

When  you  make  a  decision  to  reject  or  not  reject  Hp,  do  as  follows: 

•  If  a  >  p-value,  reject  Hg.  The  results  of  the  sample  data  are  significant.  There  is  sufficient  evidence  to 
conclude  that  Hg  is  an  incorrect  belief  and  that  the  alternative  hypothesis.  Ha,  may  be  correct. 

•  If  a  <  p-value,  do  not  reject  Hg.  The  results  of  the  sample  data  are  not  significant.  There  is  not 
sufficient  evidence  to  conclude  that  the  alternative  hypothesis.  Ha,  may  be  correct. 

•  When  you  "do  not  reject  Hg",  it  does  not  mean  that  you  should  believe  that  Hg  is  true.  It  simply 
means  that  the  sample  data  have  failed  to  provide  sufficient  evidence  to  cast  serious  doubt  about  the 
truthfulness  of  Hg. 

Conclusion:  After  you  make  your  decision,  write  a  thoughtful  conclusion  about  the  hypotheses  in  terms 
of  the  given  problem. 

9.9  Additional  Information' 

•  In  a  hypothesis  test  problem,  you  may  see  words  such  as  "the  level  of  significance  is  1%."  The  "1%"  is 
the  preconceived  or  preset  a. 

•  The  statistician  setting  up  the  h}^othesis  test  selects  the  value  of  oc  to  use  before  collecting  the  sample 
data. 

•  If  no  level  of  significance  is  given,  the  accepted  standard  is  to  use  a  =  0.05. 

•  When  you  calculate  the  p-value  and  draw  the  picture,  the  p-value  is  the  area  in  the  left  tail,  the  right 
tail,  or  split  evenly  between  the  two  tails.  For  this  reason,  we  call  the  hypothesis  test  left,  right,  or  two 
tailed. 

•  The  alternate  hypothesis.  Ha,  tells  you  if  the  test  is  left,  right,  or  two-tailed.  It  is  the  key  to  conducting 

the  appropriate  test. 

•  Ha  never  has  a  symbol  that  contains  an  equal  sign. 

•  Thinking  about  the  meaning  of  the  p-value:  A  data  analyst  (and  anyone  else)  should  have  more 
confidence  that  he  made  the  correct  decision  to  reject  the  null  hypothesis  with  a  smaller  p-value  (for 
example,  0.001  as  opposed  to  0.04)  even  if  using  the  0.05  level  for  alpha.  Similarly,  for  a  large  p-value 
like  0.4,  as  opposed  to  a  p-value  of  0.056  (alpha  =  0.05  is  less  than  either  number),  a  data  analyst  should 
have  more  confidence  that  she  made  the  correct  decision  in  failing  to  reject  the  nuU  hypothesis.  This 
makes  the  data  analyst  use  judgment  rather  than  mindlessly  applying  rules. 

^This  content  is  available  onKne  at  <http://cnx.org/content/ml6992/l.ll/>. 
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The  following  examples  illustrate  a  left,  right,  and  two-tailed  test. 

Example  9.8 

Hg-.  ii  =  5       Ha-.  j.i<5 

Test  of  a  single  population  mean.  Ha  tells  you  the  test  is  left-tailed.  The  picture  of  the  p-value  is  as 
follows: 

p-value 


5 

Example  9.9 

Ho-,  p  <  0.2       H«:  p  >  0.2 

This  is  a  test  of  a  single  population  proportion.  Ha  tells  you  the  test  is  right-tailed.  The  picture  of 
the  p-value  is  as  follows: 


p-value 


Example  9.10 

Ho-,  fi  =  50       Ha.  fi^  50 

This  is  a  test  of  a  single  population  mean.  Ha  tells  you  the  test  is  two-tailed.  The  picture  of  the 
p-value  is  as  follows. 


50 
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9.10  Summary  of  the  Hypothesis  Test 


.10 


The  hypothesis  test  itself  has  an  established  process.  This  can  be  siraraiarized  as  follows: 

1.  Determine  Ho  and  Ha.  Remember,  they  are  contradictory. 

2.  Determine  the  random  variable. 

3.  Determine  the  distribution  for  the  test. 

4.  Draw  a  graph,  calculate  the  test  statistic,  and  use  the  test  statistic  to  calculate  the  p-value.  (A  z-score 
and  a  t-score  are  examples  of  test  statistics.) 

5.  Compare  the  preconceived  a  with  the  p-value,  make  a  decision  (reject  or  do  not  reject  Ho),  and  write 
a  clear  conclusion  using  English  sentences. 

Notice  that  in  performing  the  h5^othesis  test,  you  use  a  and  not  jS.  /S  is  needed  to  help  determine  the 
sample  size  of  the  data  that  is  used  in  calculating  the  p-value.  Remember  that  the  quantity  1  —  /3  is  called 
the  Power  of  the  Test.  A  high  power  is  desirable.  If  the  power  is  too  low,  statisticians  t5^ically  increase  the 
sample  size  while  keeping  a  the  same.  If  the  power  is  low,  the  null  hypothesis  might  not  be  rejected  when 
it  should  be. 


Example  9.11 

Jeffrey,  as  an  eight-year  old,  established  a  mean  time  of  16.43  seconds  for  swimming  the  25-yard 
freestyle,  with  a  standard  deviation  of  0.8  seconds.  His  dad,  Frank,  thought  that  Jeffrey  could 
swim  the  25-yard  freestyle  faster  by  using  goggles.  Frank  bought  Jeffrey  a  new  pair  of  expensive 
goggles  and  timed  Jeffrey  for  15  25-yard  freestyle  swims.  For  the  15  swims,  Jeffrey's  mean  time 
was  16  seconds.  Frank  thought  that  the  goggles  helped  Jeffrey  to  swim  faster  than  the  16.43 
seconds.  Conduct  a  h3^othesis  test  using  a  preset  a  —  0.05.  Assume  that  the  swim  times  for  the 
25-yard  freestyle  are  normal. 

Solution 

Set  up  the  Hypothesis  Test: 

Since  the  problem  is  about  a  mean,  this  is  a  test  of  a  single  population  mean. 

Ho:}i  =  16  A3        Ha:}i<  16.43 

For  Jeffrey  to  swim  faster,  his  time  will  be  less  than  16.43  seconds.  The  "<"  tells  you  this  is  left- 
tailed. 

Determine  the  distribution  needed: 

Random  variable:  X  =  the  mean  time  to  swim  the  25-yard  freestyle. 

Distribution  for  the  test:  X  is  normal  (population  standard  deviation  is  known:  a  —  0.8) 


9.11  Examples 


11 


]i  —  16.43  comes  from  Hq  and  not  the  data,  a  =  0.8,  and  n  —  15. 


Calculate  the  p-value  using  the  normal  distribution  for  a  mean: 
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p-value  =  P  ^x<  16^  =  0.0187  where  the  sample  mean  in  the  problem  is  given  as  16. 

p-value  =  0.0187  (This  is  called  the  actual  level  of  significance.)  The  p-value  is  the  area  to  the  left 
of  the  sample  mean  is  given  as  16. 

Graph: 

p-value 


Figure  9.1 


J.I  =  16 A3  comes  from  Hg.  Our  assumption  is  f(  =  16.43. 

Interpretation  of  the  p-value:  If  Hg  is  true,  there  is  a  0.0187  probability  (1.87%)  that  Jeffrey's  mean 
time  to  swim  the  25-yard  freestyle  is  16  seconds  or  less.  Because  a  1.87%  chance  is  small,  the  mean 
time  of  16  seconds  or  less  is  unlikely  to  have  happened  randomly.  It  is  a  rare  event. 

Compare  a  and  the  p-value: 

a  =  0.05       p-value  =  0.0187       a  >  p-value 

Make  a  decision:  Since  a  >  p-value,  reject  Hg. 

This  means  that  you  reject  }i  =  16.43.  In  other  words,  you  do  not  think  Jeffrey  swims  the  25-yard 
freestyle  in  16.43  seconds  but  faster  with  the  new  goggles. 

Conclusion:  At  the  5%  significance  level,  we  conclude  that  Jeffrey  swims  faster  using  the  new 
goggles.  The  sample  data  show  there  is  sufficient  evidence  that  Jeffrey's  mean  time  to  swim  the 
25-yard  freestyle  is  less  than  16.43  seconds. 

The  p-value  can  easily  be  calculated  using  the  TI-83+  and  the  TI-84  calculators: 

Press  STAT  and  arrow  over  to  TESTS.  Press  1 :  Z-Test.  Arrow  over  to  Stats  and  press  ENTER.  Arrow 
down  and  enter  16.43  for  (null  hypothesis),  .8  for  u,  16  for  the  sample  mean,  and  15  for  n.  Arrow 
down  to  pi:  (alternate  hypothesis)  and  arrow  over  to  <}io.  Press  ENTER.  Arrow  down  to  Calculate 
and  press  ENTER.  The  calculator  not  only  calculates  the  p-value  (p  =  0.0187)  but  it  also  calculates 
the  test  statistic  (z-score)  for  the  sample  mean,  <  16 A3  is  the  alternate  hypothesis.  Do  this  set 
of  instructions  again  except  arrow  to  Draw  (instead  of  Calculate).  Press  ENTER.  A  shaded  graph 
appears  with  z  =  —2.08  (test  statistic)  and  p  =  0.0187  (p-value).  Make  sure  when  you  use  Draw 
that  no  other  equations  are  highlighted  in  Y  =  and  the  plots  are  turned  off. 
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When  the  calculator  does  a  Z-Test,  the  Z-Test  function  finds  the  p-value  by  doing  a  normal  prob- 
ability calculation  using  the  Central  Limit  Theorem: 

P(x  <  16)  =  2nd  DISTR  normcdf  (^-10^99, 16, 16.43, 0.8/ ^/^5)  . 

The  Type  I  and  Type  II  errors  for  this  problem  are  as  follows: 

The  Type  I  error  is  to  conclude  that  Jeffrey  swims  the  25-yard  freestyle,  on  average,  in  less  than 
16.43  seconds  when,  in  fact,  he  actually  swims  the  25-yard  freestyle,  on  average,  in  16.43  seconds. 
(Reject  the  null  hypothesis  when  the  null  hypothesis  is  true.) 

The  Type  II  error  is  that  there  is  not  evidence  to  conclude  that  Jeffrey  swims  the  25-yard  free-style, 
on  average,  in  less  than  16.43  seconds  when,  in  fact,  he  actually  does  swim  the  25-yard  free-style, 
on  average,  in  less  than  16.43  seconds.  (Do  not  reject  the  null  hjrpothesis  when  the  null  hj^othesis 
is  false.) 


Historical  Note:  The  traditional  way  to  compare  the  two  probabilities,  a  and  the  p-value,  is  to  compare 
the  critical  value  (z-score  from  a)  to  the  test  statistic  (z-score  from  data).  The  calculated  test  statistic  for  the 
p-value  is  —2.08.  (From  the  Central  Limit  Theorem,  the  test  statistic  formula  is  z  =  ^  J'^ .  For  this  problem, 

X  =  16,  fix  =  16.43  from  the  null  hypothesis,  ax  =  0.8,  and  n  =  15.)  You  can  find  the  critical  value  for 
a  =  0.05  in  the  normal  table  (see  IS.Tables  in  the  Table  of  Contents).  The  z-score  for  an  area  to  the  left 
equal  to  0.05  is  midway  between  -1.65  and  -1.64  (0.05  is  midway  between  0.0505  and  0.0495).  The  z-score  is 
-1.645.  Since  —1.645  >  —  2.08  (which  demonstrates  that  a  >  p-value),  reject  Hg.  Traditionally,  the  decision 
to  reject  or  not  reject  was  done  in  this  way.  Today,  comparing  the  two  probabilities  a  and  the  p-value  is  very 
common.  For  this  problem,  the  p-value,  0.0187  is  considerably  smaller  than  a,  0.05.  You  can  be  confident 
about  your  decision  to  reject.  The  graph  shows  a,  the  p-value,  and  the  test  statistics  and  the  critical  value. 


a  =  0.05 


Figure  9.2 


Example  9.12 

A  college  football  coach  thought  that  his  players  could  bench  press  a  mean  weight  of  275  pounds. 
It  is  known  that  the  standard  deviation  is  55  pounds.  Three  of  his  players  thought  that  the  mean 
weight  was  more  than  that  amount.  They  asked  30  of  their  teammates  for  their  estimated  maxi- 
mum lift  on  the  bench  press  exercise.  The  data  ranged  from  205  pounds  to  385  pounds.  The  actual 
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different  weights  were  (frequencies  are  in  parentheses)  205(3);  215(3);  225(1);  241(2);  252(2);  265(2); 
275(2);  313(2);  316(5);  338(2);  341(1);  345(2);  368(2);  385(1).  (Source:  data  from  Reuben  Davis,  Kraig 
Evans,  and  Scott  Gunderson.) 

Conduct  a  h5rpothesis  test  using  a  2.5%  level  of  significance  to  determine  if  the  bench  press  mean 
is  more  than  275  pounds. 

Solution 

Set  up  the  H5^othesis  Test: 

Since  the  problem  is  about  a  mean  weight,  this  is  a  test  of  a  single  population  mean. 

Ho-,  pi  =  275       Ha-  li>  275       This  is  a  right-tailed  test. 
Calculating  the  distribution  needed: 

Random  variable:  X  =  the  mean  weight,  in  pounds,  lifted  by  the  football  players. 
Distribution  for  the  test:  It  is  normal  because  a  is  known. 

X  =  286.2  pounds  (from  the  data). 

cr  =  55  pounds  (Always  use  cr  if  you  know  it.)  We  assume  }i  =  275  pounds  unless  our  data  shows 
us  otherwise. 

Calculate  the  p-value  using  the  normal  distribution  for  a  mean  and  using  the  sample  mean  as 
input  (see  the  calculator  instructions  below  for  using  the  data  as  input): 

p-value  =  P  (  X  >  286.2)  =  0.1323. 

Interpretation  of  the  p-value:  If  Hg  is  true,  then  there  is  a  0.1331  probability  (13.23%)  that  the 
football  players  can  lift  a  mean  weight  of  286.2  pounds  or  more.  Because  a  13.23%  chance  is  large 
enough,  a  mean  weight  lift  of  286.2  pounds  or  more  is  not  a  rare  event. 

X  =  286.2        p-value  =  0.1323 


Figure  9.3 


Compare  a  and  the  p-value: 
a  =  0.025       p-value  =  0.1323 
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Make  a  decision:  Since  a<p-value,  do  not  reject  Hg. 

Conclusion:  At  the  2.5%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence 
to  conclude  that  the  true  mean  weight  lifted  is  more  than  275  pounds. 

The  p-value  can  easily  be  calculated  using  the  TI-83+  and  the  TI-84  calculators: 

Put  the  data  and  frequencies  into  lists.  Press  STAT  and  arrow  over  to  TESTS.  Press  1 :  Z-Test.  Arrow 
over  to  Data  and  press  ENTER.  Arrow  down  and  enter  275  for  fiQ,  55  for  a,  the  name  of  the  list  where 
you  put  the  data,  and  the  name  of  the  list  where  you  put  the  frequencies.  Arrow  down  to  p  :  and 
arrow  over  to  >  fiQ.  Press  ENTER.  Arrow  down  to  Calculate  and  press  ENTER.  The  calculator  not 
only  calculates  the  p-value  (p  —  0.1331,  a  little  different  from  the  above  calculation  -  in  it  we 
used  the  sample  mean  rounded  to  one  decimal  place  instead  of  the  data)  but  it  also  calculates  the 
test  statistic  (z-score)  for  the  sample  mean,  the  sample  mean,  and  the  sample  standard  deviation. 
fi  >  275  is  the  alternate  hypothesis.  Do  this  set  of  instructions  again  except  arrow  to  Draw  (instead 
of  Calculate).  Press  ENTER.  A  shaded  graph  appears  with  z  —  1.112  (test  statistic)  and  p  =  0.1331 
(p-value).  Make  sure  when  you  use  Draw  that  no  other  equations  are  highlighted  in  Y  =  and  the 
plots  are  turned  off. 


Example  9.13 

Statistics  students  believe  that  the  mean  score  on  the  first  statistics  test  is  65.  A  statistics  instructor 
thinks  the  mean  score  is  higher  than  65.  He  samples  ten  statistics  students  and  obtains  the  scores 
65;  65;  70;  67;  66;  63;  63;  68;  72;  71.  He  performs  a  hypothesis  test  using  a  5%  level  of  significance. 
The  data  are  from  a  normal  distribution. 

Solution 

Set  up  the  Hypothesis  Test: 

A  5%  level  of  significance  means  that  a  =  0.05.  This  is  a  test  of  a  single  population  mean. 
Hg-.  ji  =  65       Ha.  ^  >  65 

Since  the  instructor  thinks  the  average  score  is  higher,  use  a  ">  ".  The  ">  "  means  the  test  is 
right-tailed. 

Determine  the  distribution  needed: 

Random  variable:  X  =  average  score  on  the  first  statistics  test. 

Distribution  for  the  test:  If  you  read  the  problem  carefully,  you  will  notice  that  there  is  no  pop- 
ulation standard  deviation  given.  You  are  only  given  n  =  10  sample  data  values.  Notice  also 
that  the  data  come  from  a  normal  distribution.  This  means  that  the  distribution  for  the  test  is  a 
student's-t. 

Use  t(jf.  Therefore,  the  distribution  for  the  test  is    where  n  =  10  and  df  =  10  —  1  =  9. 
Calculate  the  p-value  using  the  Student's-t  distribution: 

p-value  —  P  (  X  >  67  )=  0.0396  where  the  sample  mean  and  sample  standard  deviation  are 
calculated  as  67  and  3.1972  from  the  data. 

Interpretation  of  the  p-value:  If  the  null  hypothesis  is  true,  then  there  is  a  0.0396  probability 
(3.96%)  that  the  sample  mean  is  67  or  more. 
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p-va1ue  =  0.0396 


Figure  9.4 


Compare  a  and  the  p-value: 

Since  a  =  .05  and  p-value  =  0.0396.  Therefore,  a  >  p-value. 
Make  a  decision:  Since  a  >  p-value,  reject  Hg. 

This  means  you  reject    =  65.  In  other  words,  you  believe  the  average  test  score  is  more  than  65. 

Conclusion:  At  a  5%  level  of  significance,  the  sample  data  show  sufficient  evidence  that  the  mean 
(average)  test  score  is  more  than  65,  just  as  the  math  instructor  thinks. 

The  p-value  can  easily  be  calculated  using  the  TI-83+  and  the  TI-84  calculators: 

Put  the  data  into  a  list.  Press  STAT  and  arrow  over  to  TESTS.  Press  2:T-Test.  Arrow  over  to 
Data  and  press  ENTER.  Arrow  down  and  enter  65  for  }Iq,  the  name  of  the  list  where  you  put  the 
data,  and  1  for  Freq: .  Arrow  down  to  ja  :  and  arrow  over  to  >  }io-  Press  ENTER.  Arrow  down 
to  Calculate  and  press  ENTER.  The  calculator  not  only  calculates  the  p-value  (p  =  0.0396)  but  it 
also  calculates  the  test  statistic  (t-score)  for  the  sample  mean,  the  sample  mean,  and  the  sample 
standard  deviation.  >  65  is  the  alternate  hypothesis.  Do  this  set  of  instructions  again  except 
arrow  to  Draw  (instead  of  Calculate).  Press  ENTER.  A  shaded  graph  appears  with  f  =  1.9781  (test 
statistic)  and  p  =  0.0396  (p-value).  Make  sure  when  you  use  Draw  that  no  other  equations  are 
highlighted  in  Y  =  and  the  plots  are  turned  off. 


Example  9.14 

Joon  believes  that  50%  of  first-time  brides  in  the  United  States  are  younger  than  their  grooms. 
She  performs  a  hypothesis  test  to  determine  if  the  percentage  is  the  same  or  different  from  50%. 
Joon  samples  100  first-time  brides  and  53  reply  that  they  are  yoimger  than  their  grooms.  For  the 
hjrpothesis  test,  she  uses  a  1%  level  of  significance. 

Solution 

Set  up  the  Hypothesis  Test: 

The  1%  level  of  significance  means  that  a  =  0.01.  This  is  a  test  of  a  single  population  proportion. 
Ho-,  p  =  0.50       Ha.  p  ^  0.50 

The  words  "is  the  same  or  different  from"  tell  you  this  is  a  two-tailed  test. 
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Calculate  the  distribution  needed: 

Random  variable:  P'  =  the  percent  of  of  first-time  brides  who  are  younger  than  their  grooms. 

Distribution  for  the  test:  The  problem  contains  no  mention  of  a  mean.  The  information  is  given 
in  terms  of  percentages.  Use  the  distribution  for  P',  the  estimated  proportion. 


P'  -  N  i^p,  Therefore,  P'  ~  N  j^O.5,  ^J^^j  where  p  =  0.50,  q  =  l-p  =  0.50,  and 

n  =  100. 

Calculate  the  p-value  using  the  normal  distribution  for  proportions: 
p-value  =  P{p'<  0.47  or  p'  >  0.53  )  =  0.5485 
where  x  =  53,  p'  =  ^  =  ^  =  0.53. 

Interpretation  of  the  p-value:  If  the  null  hypothesis  is  true,  there  is  0.5485  probability  (54.85%) 
that  the  sample  (estimated)  proportion  p'  is  0.53  or  more  OR  0.47  or  less  (see  the  graph  below). 

-(p-value)  =  0.27425       -(p-value)  =  0.27425 
2  \    2 


0.47     0.50  0.53 
Figure  9.5 


H  =  p  —  0.50  comes  from  Hg,  the  null  hypothesis. 

p'=  0.53.  Since  the  curve  is  symmetrical  and  the  test  is  two-tailed,  the  p'  for  the  left  tail  is  equal  to 
0.50  -  0.03  =  0.47  where  ji  =  p  =  0.50.  (0.03  is  the  difference  between  0.53  and  0.50.) 

Compare  a  and  the  p-value: 

Since  a  =  0.01  and  p-value  =  0.5485.  Therefore,  a<  p-value. 
Make  a  decision:  Since  a<p-value,  you  cannot  reject  Hg. 

Conclusion:  At  the  1%  level  of  significance,  the  sample  data  do  not  show  sufficient  evidence  that 
the  percentage  of  first-time  brides  that  are  younger  than  their  grooms  is  different  from  50%. 

The  p-value  can  easily  be  calculated  using  the  TI-83+  and  the  TI-84  calculators: 

Press  STAT  and  arrow  over  to  TESTS.  Press  5: 1-PropZTest.  Enter  .5  for  po,  53  for  x  and  100  for 
n.  Arrow  down  to  Prop  and  arrow  to  not  equals  po-  Press  ENTER.  Arrow  down  to  Calculate 
and  press  ENTER.  The  calculator  calculates  the  p-value  (p  =  0.5485)  and  the  test  statistic  (z-score). 
Prop  not  equals  .5  is  the  alternate  hypothesis.  Do  this  set  of  instructions  again  except  arrow  to 
Draw  (instead  of  Calculate).  Press  ENTER.  A  shaded  graph  appears  with  z  =  0.6  (test  statistic)  and 
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p  —  0.5485  (p-value).  Make  sure  when  you  use  Draw  that  no  other  equations  are  highlighted  in 
y  —  and  the  plots  are  turned  off. 

The  Type  1  and  Type  11  errors  are  as  follows: 

The  Type  1  error  is  to  conclude  that  the  proportion  of  first-time  brides  that  are  younger  than  their 
grooms  is  different  from  50%  when,  in  fact,  the  proportion  is  actually  50%.  (Reject  the  nuU  hy- 
pothesis when  the  niill  hj^othesis  is  true). 

The  Type  II  error  is  there  is  not  enough  evidence  to  conclude  that  the  proportion  of  first  time  brides 
that  are  younger  than  their  grooms  differs  from  50%  when,  in  fact,  the  proportion  does  differ  from 
50%.  (Do  not  reject  the  null  hypothesis  when  the  null  hjrpothesis  is  false.) 


Example  9.15 


Suppose  a  consumer  group  suspects  that  the  proportion  of  households  that  have  three  cell  phones 
is  30%.  A  cell  phone  company  has  reason  to  believe  that  the  proportion  is  30%.  Before  they  start 
a  big  advertising  campaign,  they  conduct  a  hypothesis  test.  Their  marketing  people  survey  150 
households  with  the  result  that  43  of  the  households  have  three  cell  phones. 

Solution 

Set  up  the  Hypothesis  Test: 
Ho:  p  =  0.30       Ha.  vi-  0.30 
Determine  the  distribution  needed: 

The  random  variable  is  P'  -  proportion  of  households  that  have  three  cell  phones. 


The  distribution  for  the  hypothesis  test  is  P'  ~  N  ^0.30,  \j  (o^^^^ZO)  j 
Problem  2 

The  value  that  helps  determine  the  p-value  is  p'.  Calculate  p'. 
Problem  3 

What  is  a  success  for  this  problem? 
Problem  4 

What  is  the  level  of  significance? 

Draw  the  graph  for  this  problem.  Draw  the  horizontal  axis.  Label  and  shade  appropriately. 
Problem  5 

Calculate  the  p-value. 


The  next  example  is  a  poem  written  by  a  statistics  student  named  Nicole  Hart.  The  solution  to  the  problem 
follows  the  poem.  Notice  that  the  hypothesis  test  is  for  a  single  popiilation  proportion.  This  means  that  the 
null  and  alternate  hypotheses  use  the  parameter  p.  The  distribution  for  the  test  is  normal.  The  estimated 


Problem  1 


Problem  6 
Make  a  decision. 


.(Reject/Do  not  reject)  Hq  because. 
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proportion  p'  is  the  proportion  of  fleas  killed  to  the  total  fleas  found  on  Fido.  This  is  sample  information. 
The  problem  gives  a  preconceived  a  —  0.01,  for  comparison,  and  a  95%  confidence  interval  computation. 
The  poem  is  clever  and  humorous,  so  please  enjoy  it! 

NOTE:  H3rpothesis  testing  problems  consist  of  multiple  steps.  To  help  you  do  the  problems,  so- 
lution sheets  are  provided  for  your  use.  Look  in  the  Table  of  Contents  Appendix  for  the  topic 
"Solution  Sheets."  If  you  like,  use  copies  of  the  appropriate  solution  sheet  for  homework  prob- 
lems. 

Example  9.16 

My  dog  has  so  many  fleas, 
They  do  not  come  off  with  ease . 
As  for  shampoo,  1  have  tried  many  types 
Even  one  called  Bubble  Hype , 
Which  only  killed  257.  of  the  fleas. 
Unfortunately  I  was  not  pleased. 

I've  used  all  kinds  of  soap. 
Until  I  had  give  up  hope 
Until  one  day  I  saw 
An  ad  that  put  me  in  awe . 

A  shampoo  used  for  dogs 

Called  GOOD  ENOUGH  to  Clean  a  Hog 

Guaranteed  to  kill  more  fleas. 

I  gave  Fido  a  bath 
And  after  doing  the  math 
His  number  of  fleas 
Started  dropping  by  3's! 

Before  his  shampoo 
I  counted  42. 
At  the  end  of  his  bath, 
I  redid  the  math 

And  the  new  shampoo  had  killed  17  fleas. 
So  now  I  was  pleased. 

Now  it  is  time  for  you  to  have  some  fun 
With  the  level  of  significance  being  .01, 
You  must  help  me  figure  out 
Use  the  new  shampoo  or  go  without? 

Solution 

Set  up  the  Hj^othesis  Test: 
Ho-,  p  =  0.25       Ha.  V  >  0.25 
Determine  the  distribution  needed: 

In  words,  CLEARLY  state  what  your  random  variable  X  or  P'  represents. 
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P'  =  The  proportion  of  fleas  that  are  killed  by  the  new  shampoo 
State  the  distribution  to  use  for  the  test. 

Normal:  N  (o.25,  ^^^1^^^ 
Test  Statistic:  z  =  2.3163 

Calculate  the  p-value  using  the  normal  distribution  for  proportions: 
p-value  =0.0103 

In  1  -  2  complete  sentences,  explain  what  the  p-value  means  for  this  problem. 

If  the  null  hypothesis  is  true  (the  proportion  is  0.25),  then  there  is  a  0.0103  probability  that  the 
sample  (estimated)  proportion  is  0.4048  (^52^  or  more. 

Use  the  previous  information  to  sketch  a  picture  of  this  situation.  CLEARLY,  label  and  scale  the 
horizontal  axis  and  shade  the  region(s)  corresponding  to  the  p-value. 


Q_25      17/42~    t«st  sTiiti Stic  for 
0.40Ja    1?  42;  2  J163 

Figure  9.6 


Compare  oc  and  the  p-value: 

Indicate  the  correct  decision  ("reject"  or  "do  not  reject"  the  null  hypothesis),  the  reason  for  it,  and 
write  an  appropriate  conclusion,  using  COMPLETE  SENTENCES. 


alpha 

decision 

reason  for  decision 

0.01 

Do  not  reject  Ho 

a<p-value 

Table  9.3 


Conclusion:  At  the  1%  level  of  significance,  the  sample  data  do  not  show  sufficient  evidence  that 
the  percentage  of  fleas  that  are  killed  by  the  new  shampoo  is  more  than  25%. 

Construct  a  95%  Confidence  Interval  for  the  true  mean  or  proportion.  Include  a  sketch  of  the 
graph  of  the  situation.  Label  the  point  estimate  and  the  lower  and  upper  bounds  of  the  Confidence 
Interval. 
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0.26      17/42  0.55 


Figure  9.7 


Confidence  Interval:  (0.26,0.55)  We  are  95%  confident  that  the  true  population  proportion  p  of 
fleas  that  are  killed  by  the  new  shampoo  is  between  26%  and  55%. 

NOTE:  This  test  result  is  not  very  definitive  since  the  p-value  is  very  close  to  alpha.  In  reality,  one 
would  probably  do  more  tests  by  giving  the  dog  another  bath  after  the  fleas  have  had  a  chance  to 
return. 
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9.12  Summary  of  Formulas^^ 

Ho  and  Ha  are  contradictory. 


If  Ho  has: 

equal  {—) 

greater  than  or  equal  to 

(>) 

less  than  or  equal  to 

(<) 

then  Ha  has: 

not  equal  (7^)  or  greater 
than  (>  )  or  less  than 

(<) 

less  than  ( <  ) 

greater  than  (>  ) 

Table  9.4 


If  a  <  p-value,  then  do  not  reject  Ho- 

If  a  >  p-value,  then  reject  H  . 

a  is  preconceived.  Its  value  is  set  before  the  hjrpothesis  test  starts.  The  p-value  is  calculated  from  the  data. 

a  =  probability  of  a  Type  I  error  =  P(Type  I  error)  =  probability  of  rejecting  the  nuR  hypothesis  when  the 
null  hjrpothesis  is  true. 

(5  -  probability  of  a  Type  11  error  =  P(Type  11  error)  =  probability  of  not  rejecting  the  null  hypothesis  when 
the  null  hypothesis  is  false. 

If  there  is  no  given  preconceived  a,  then  use  a  —  0.05. 
Types  of  Hypothesis  Tests 

•  Single  population  mean,  known  population  variance  (or  standard  deviation):  Normal  test. 

•  Single  population  mean,  unknown  population  variance  (or  standard  deviation):  Student's-t  test. 

•  Single  popiilation  proportion:  Normal  test. 


^^This  content  is  available  online  at  <http:/ / cnx.org/content/ ml6996/1.9/ >. 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


397 

9.13  Practice  1:  Single  Mean,  Known  Population  Standard  Deviation" 

9.13.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  hypothesis  test  of  a  single  mean  with  known  population  standard  devia- 
tion. 


9.13.2  Given 

Suppose  that  a  recent  article  stated  that  the  mean  time  spent  in  jail  by  a  first-time  convicted  burglar  is  2.5 
years.  A  study  was  then  done  to  see  if  the  mean  time  has  increased  in  the  new  century.  A  random  sample 
of  26  first-time  convicted  burglars  in  a  recent  year  was  picked.  The  mean  length  of  time  in  jail  from  the 
survey  was  3  years  with  a  standard  deviation  of  1.8  years.  Suppose  that  it  is  somehow  known  that  the 
population  standard  deviation  is  1.5.  Conduct  a  h5^othesis  test  to  determine  if  the  mean  length  of  jail  time 
has  increased.  The  distribution  of  the  population  is  normal. 


9.13.3  Hypothesis  Testing:  Single  Mean 


Exercise  9.13.1 

Is  this  a  test  of  means  or  proportions? 
Exercise  9.13.2 

State  the  niiU  and  alternative  hypotheses. 

a.  Ho: 

b.  Ha: 

Exercise  9.13.3 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  How  do  you  know? 
Exercise  9.13.4 

What  symbol  represents  the  Random  Variable  for  this  test? 
Exercise  9.13.5 

In  words,  define  the  Random  Variable  for  this  test. 
Exercise  9.13.6 

Is  the  popiilation  standard  deviation  known  and,  if  so,  what  is  it? 

Exercise  9.13.7 
Calculate  the  following: 

a.  X  ~ 

b.  cr  = 

C.  Sx  = 

d.  n  — 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


Exercise  9.13.8  (Solution  on  p.  423.) 

Since  both  a  and  S;^  are  given,  which  should  be  used?  In  1  -2  complete  sentences,  explain  why. 

Exercise  9.13.9  (Solution  on  p.  423.) 

State  the  distribution  to  use  for  the  hypothesis  test. 

Exercise  9.13.10 

Sketch  a  graph  of  the  situation.  Label  the  horizontal  axis.  Mark  the  hypothesized  mean  and  the 
sample  mean  x.  Shade  the  area  corresponding  to  the  p-value. 

'This  content  is  available  online  at  <http://cnx.Org/content/ml7004/l. ll/>. 
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X 


Exercise  9.13.11 

Find  the  p-value, 

Exercise  9.13.12 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


At  a  pre-conceived  a  =  0.05,  what  is  your: 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 


9.13.4  Discussion  Questions 
Exercise  9.13.13 

Does  it  appear  that  the  mean  jail  time  spent  for  first  time  convicted  burglars  has  increased?  Why 


or  why  not? 
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9.14  Practice  2:  Single  Mean,  Unknown  Population  Standard  Deviation" 

9.14.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  hypothesis  test  of  a  single  mean  with  ujnknown  popiilation  standard  de- 
viation. 


9.14.2  Given 

A  random  survey  of  75  death  row  inmates  revealed  that  the  mean  length  of  time  on  death  row  is  17.4  years 
with  a  standard  deviation  of  6.3  years.  Conduct  a  hypothesis  test  to  determine  if  the  population  mean  time 
on  death  row  could  likely  be  15  years. 


9.14.3  Hypothesis  Testing:  Single  Mean 

Exercise  9.14.1  (Solution  on  p.  423.) 

Is  this  a  test  of  means  or  proportions? 

Exercise  9.14.2  (Solution  on  p.  423.) 

State  the  null  and  alternative  hj^otheses. 

a.  Ho  : 

b.  Ha-. 


Exercise  9.14.3 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  How  do  you  know? 
Exercise  9.14.4 

What  symbol  represents  the  Random  Variable  for  this  test? 
Exercise  9.14.5 

In  words,  define  the  Random  Variable  for  this  test. 
Exercise  9.14.6 

Is  the  population  standard  deviation  known  and,  if  so,  what  is  it? 

Exercise  9.14.7 

Calculate  the  following: 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


(Solution  on  p.  423.) 


a.  X  — 

b.  6.3  = 

c.  n  = 


Exercise  9.14.8  (Solution  on  p.  424.) 

Which  test  should  be  used?  In  1  -2  complete  sentences,  explain  why 

Exercise  9.14.9  (Solution  on  p.  424.) 

State  the  distribution  to  use  for  the  hypothesis  test. 

Exercise  9.14.10 

Sketch  a  graph  of  the  situation.  Label  the  horizontal  axis.  Mark  the  h5^othesized  mean  and  the 
sample  mean,  x.  Shade  the  area  corresponding  to  the  p-value. 

'This  content  is  available  online  at  <http://cnx.org/content/ml7016/1.13/>. 
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X 


Figure  9.8 


Exercise  9.14.11 

Find  the  p-value, 

Exercise  9.14.12 


(Solution  on  p.  424.) 


(Solution  on  p.  424.) 


At  a  pre-conceived  a  =  0.05,  what  is  your: 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 


9.14.4  Discussion  Question 

Does  it  appear  that  the  mean  time  on  death  row  could  be  15  years?  Why  or  why  not? 
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9.15  Practice  3:  Single  Proportion^^ 

9.15.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  hypothesis  test  of  a  single  population  proportion. 


9.15.2  Given 

The  National  Institute  of  Mental  Health  published  an  article  stating  that  in  any  one-year  pe- 
riod, approximately  9.5  percent  of  American  adults  suffer  from  depression  or  a  depressive  illness, 
(http:/ / www.nimh.nih.gov/publicat/depression.cfm)  Suppose  that  in  a  survey  of  100  people  in  a  certain 
town,  seven  of  them  suffered  from  depression  or  a  depressive  illness.  Conduct  a  hypothesis  test  to  deter- 
mine if  the  true  proportion  of  people  in  that  town  suffering  from  depression  or  a  depressive  illness  is  lower 
than  the  percent  in  the  general  adult  American  population. 

9.15.3  Hypothesis  Testing:  Single  Proportion 
Exercise  9.15.1 

Is  this  a  test  of  means  or  proportions? 
Exercise  9.15.2 

State  the  null  and  alternative  hypotheses 

a.  Ho  : 

b.  Ha  : 

Exercise  9.15.3 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  How  do  you  know? 
Exercise  9.15.4 

What  symbol  represents  the  Random  Variable  for  this  test? 
Exercise  9.15.5 

In  words,  define  the  Random  Variable  for  this  test. 

Exercise  9.15.6 
Calculate  the  following: 

a:  X  = 
h:  n  = 
c:  p'  = 

Exercise  9.15.7  (Solution  on  p.  424.) 

Calculate  dp'.  Make  sure  to  show  how  you  set  up  the  formula. 

Exercise  9.15.8  (Solution  on  p.  424.) 

State  the  distribution  to  use  for  the  hypothesis  test. 

Exercise  9.15.9 

Sketch  a  graph  of  the  situation.  Label  the  horizontal  axis.  Mark  the  h5^othesized  mean  and  the 
sample  proportion,  p-hat.  Shade  the  area  corresponding  to  the  p-value. 


(Solution  on  p.  424.) 
(Solution  on  p.  424.) 


(Solution  on  p.  424.) 
(Solution  on  p.  424.) 
(Solution  on  p.  424.) 
(Solution  on  p.  424.) 


pi 
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Exercise  9.15.10  (Solution  on  p.  424.) 

Find  the  p-value 

Exercise  9.15.11  (Solution  on  p.  424.) 

At  a  pre-conceived  a  —  0.05,  what  is  youi: 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 


9.15.4  Discusion  Question 
Exercise  9.15.12 

Does  it  appear  that  the  proportion  of  people  in  that  town  with  depression  or  a  depressive  illness 
is  lower  than  general  adult  American  population?  Why  or  why  not? 
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9.16  Homework'' 

Exercise  9.16.1  (Solution  on  p.  424.) 

Some  of  the  statements  below  refer  to  the  null  hypothesis,  some  to  the  alternate  hypothesis. 

State  the  null  hypothesis.  Hp,  and  the  alternative  hypothesis.  Ha,  in  terms  of  the  appropriate  pa- 
rameter (f(  or  p). 

a.  The  mean  number  of  years  Americans  work  before  retiring  is  34. 

b.  At  most  60%  of  Americans  vote  in  presidential  elections. 

c.  The  mean  starting  salary  for  San  Jose  State  University  graduates  is  at  least  $100,000  per  year. 

d.  29%  of  high  school  seniors  get  drunk  each  month. 

e.  Fewer  than  5%  of  adults  ride  the  bus  to  work  in  Los  Angeles. 

f.  The  mean  number  of  cars  a  person  owns  in  her  lifetime  is  not  more  than  10. 

g.  About  half  of  Americans  prefer  to  live  away  from  cities,  given  the  choice. 

h.  Europeans  have  a  mean  paid  vacation  each  year  of  six  weeks. 

i.  The  chance  of  developing  breast  cancer  is  under  11%  for  women. 

j.  Private  universities  mean  tuition  cost  is  more  than  $20,000  per  year. 

Exercise  9.16.2  (Solution  on  p.  425.) 

For  (a)  -  (j)  above,  state  the  Type  I  and  Type  II  errors  in  complete  sentences. 

Exercise  9.16.3 

For  (a)  -  (j)  above,  in  complete  sentences: 

a.  State  a  consequence  of  committing  a  Type  I  error. 

b.  State  a  consequence  of  committing  a  Type  II  error. 


Directions:  For  each  of  the  word  problems,  use  a  solution  sheet  to  do  the  h5^othesis  test.  The 
solution  sheet  is  foimd  in  14.  Appendix  (online  book  version:  the  link  is  "Solution  Sheets";  PDF 
book  version:  look  under  14.5  Solution  Sheets).  Please  feel  free  to  make  copies  of  the  solution 
sheets.  For  the  online  version  of  the  book,  it  is  suggested  that  you  copy  the  .doc  or  the  .pdf  files. 

NOTE:  If  you  are  using  a  student's-t  distribution  for  a  homework  problem  below,  you  may  assume 
that  the  underlying  population  is  normally  distributed.  (In  general,  you  must  first  prove  that 
assumption,  though.) 

Exercise  9.16.4 

A  particular  brand  of  tires  claims  that  its  deluxe  tire  averages  at  least  50,000  miles  before  it  needs 
to  be  replaced.  From  past  studies  of  this  tire,  the  standard  deviation  is  known  to  be  8000.  A  survey 
of  owners  of  that  tire  design  is  conducted.  From  the  28  tires  surveyed,  the  mean  lifespan  was 
46,500  miles  with  a  standard  deviation  of  9800  miles.  Do  the  data  support  the  claim  at  the  5% 
level? 

Exercise  9.16.5  (Solution  on  p.  425.) 

From  generation  to  generation,  the  mean  age  when  smokers  first  start  to  smoke  varies.  However, 
the  standard  deviation  of  that  age  remains  constant  of  around  2.1  years.  A  survey  of  40  smokers 
of  this  generation  was  done  to  see  if  the  mean  starting  age  is  at  least  19.  The  sample  mean  was 
18.1  with  a  sample  standard  deviation  of  1.3.  Do  the  data  support  the  claim  at  the  5%  level? 

^*This  content  is  available  online  at  <http://cnx.org/content/ml7001/1.14/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


404 


CHAPTER  9.  HYPOTHESIS  TESTING:  SINGLE  MEAN  AND  SINGLE 

PROPORTION 


Exercise  9.16.6 

The  cost  of  a  daily  newspaper  varies  from  city  to  city.  However,  the  variation  among  prices 
remains  steady  with  a  standard  deviation  of  20t-  A  study  was  done  to  test  the  claim  that  the  mean 
cost  of  a  daily  newspaper  is  $1.00.  Twelve  costs  yield  a  mean  cost  of  95(£  with  a  standard  deviation 
of  18(£.  Do  the  data  support  the  claim  at  the  1%  level? 

Exercise  9.16.7  (Solution  on  p.  425.) 

An  article  in  the  San  Jose  Mercury  News  stated  that  students  in  the  California  state  university 
system  take  4.5  years,  on  average,  to  finish  their  undergraduate  degrees.  Suppose  you  believe  that 
the  mean  time  is  longer.  You  conduct  a  survey  of  49  students  and  obtain  a  sample  mean  of  5.1  with 
a  sample  standard  deviation  of  1.2.  Do  the  data  support  youi  claim  at  the  1%  level? 

Exercise  9.16.8 

The  mean  number  of  sick  days  an  employee  takes  per  year  is  believed  to  be  about  10.  Members 
of  a  personnel  department  do  not  believe  this  figure.  They  randomly  survey  8  employees.  The 
number  of  sick  days  they  took  for  the  past  year  are  as  follows:  12;  4;  15;  3;  11;  8;  6;  8.  Let  x  =  the 
number  of  sick  days  they  took  for  the  past  year.  Should  the  personnel  team  believe  that  the  mean 

number  is  about  10? 

Exercise  9.16.9  (Solution  on  p.  425.) 

In  1955,  Life  Magazine  reported  that  the  25  year-old  mother  of  three  worked,  on  average,  an  80 
hour  week.  Recently,  many  groups  have  been  studying  whether  or  not  the  women's  movement 
has,  in  fact,  resulted  in  an  increase  in  the  average  work  week  for  women  (combining  employment 
and  at-home  work).  Suppose  a  study  was  done  to  determine  if  the  mean  work  week  has  increased. 
81  women  were  surveyed  with  the  following  results.  The  sample  mean  was  83;  the  sample  stan- 
dard deviation  was  10.  Does  it  appear  that  the  mean  work  week  has  increased  for  women  at  the 
5%  level? 

Exercise  9.16.10 

Your  statistics  instructor  claims  that  60  percent  of  the  students  who  take  her  Elementary  Statistics 
class  go  through  life  feeling  more  enriched.  For  some  reason  that  she  can't  quite  figure  out,  most 
people  don't  believe  her.  You  decide  to  check  this  out  on  your  own.  You  randomly  survey  64  of 
her  past  Elementary  Statistics  students  and  find  that  34  feel  more  enriched  as  a  result  of  her  class. 
Now,  what  do  you  think? 

Exercise  9.16.11  (Solution  on  p.  425.) 

A  Nissan  Motor  Corporation  advertisement  read,  "The  average  man's  I.Q.  is  107.  The  average 
brown  trout's  I.Q.  is  4.  So  why  can't  man  catch  brown  trout?"  Suppose  you  believe  that  the  brown 
trout's  mean  I.Q.  is  greater  than  4.  You  catch  12  brown  trout.  A  fish  psychologist  determines  the 
I.Q.s  as  follows:  5;  4;  7;  3;  6;  4;  5;  3;  6;  3;  8;  5.  Conduct  a  hypothesis  test  of  your  belief. 

Exercise  9.16.12 

Refer  to  the  previous  problem.  Conduct  a  h5^othesis  test  to  see  if  your  decision  and  conclusion 
would  change  if  your  belief  were  that  the  brown  trout's  mean  I.Q.  is  not  4. 

Exercise  9.16.13  (Solution  on  p.  425.) 

According  to  an  article  in  Newsweek,  the  natural  ratio  of  girls  to  boys  is  100:105.  In  China,  the 
birth  ratio  is  100:  114  (46.7%  girls).  Suppose  you  don't  believe  the  reported  figures  of  the  percent 
of  girls  bom  in  China.  You  conduct  a  study.  In  this  study,  you  count  the  number  of  girls  and  boys 
born  in  150  randomly  chosen  recent  births.  There  are  60  girls  and  90  boys  bom  of  the  150.  Based 
on  youj  study,  do  you  believe  that  the  percent  of  girls  born  in  China  is  46.7? 

Exercise  9.16.14 

A  poll  done  for  Newsweek  found  that  13%  of  Americans  have  seen  or  sensed  the  presence  of  an 
angel.  A  contingent  doubts  that  the  percent  is  really  that  high.  It  conducts  its  own  survey.  Out 
of  76  Americans  surveyed,  only  2  had  seen  or  sensed  the  presence  of  an  angel.  As  a  result  of  the 
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contingent's  survey,  would  you  agree  with  the  Newsweek  poll?  In  complete  sentences,  also  give 
three  reasons  why  the  two  polls  might  give  different  results. 

Exercise  9.16.15  (Solution  on  p.  425.) 

The  mean  work  week  for  engineers  in  a  start-up  company  is  believed  to  be  about  60  hours.  A 
newly  hired  engineer  hopes  that  it's  shorter.  She  asks  10  engineering  friends  in  start-ups  for  the 
lengths  of  their  mean  work  weeks.  Based  on  the  results  that  follow,  should  she  count  on  the  mean 
work  week  to  be  shorter  than  60  hours? 

Data  (length  of  mean  work  week):  70;  45;  55;  60;  65;  55;  55;  60;  50;  55. 
Exercise  9.16.16 

Use  the  "Lap  time"  data  for  Lap  4  (see  Table  of  Contents)  to  test  the  claim  that  Terri  finishes  Lap 
4,  on  average,  in  less  than  129  seconds.  Use  all  twenty  races  given. 

Exercise  9.16.17 

Use  the  "Initial  Public  Offering"  data  (see  Table  of  Contents)  to  test  the  claim  that  the  mean  offer 
price  was  $18  per  share.  Do  not  use  all  the  data.  Use  your  random  number  generator  to  randomly 
survey  15  prices. 

NOTE:  The  following  questions  were  written  by  past  students.  They  are  excellent  problems! 
Exercise  9.16.18 

18.  "Asian  Family  Reunion"  by  Chau  Nguyen 

Every  two  years  it  comes  around 
We  all  get  together  from  different  towns. 
In  my  honest  opinion 
It's  not  a  typical  family  reunion 
Not  forty,  or  fifty,  or  sixty. 
But  how  about  seventy  companions ! 
The  kids  would  play,  scream,  and  shout 
One  minute  they're  happy,  another  they'll  pout. 
The  teenagers  would  look,  stare,  and  compare 
From  how  they  look  to  what  they  wear. 
The  men  would  chat  about  their  business 
That  they  make  more,  but  never  less. 
Money  is  always  their  subject 
And  there's  always  talk  of  more  new  projects. 
The  women  get  tired  from  all  of  the  chats 
They  head  to  the  kitchen  to  set  out  the  mats. 
Some  would  sit  and  some  would  stand 
Eating  and  talking  with  plates  in  their  hands. 
Then  come  the  games  and  the  songs 
And  suddenly,  everyone  gets  along! 
With  all  that  laughter,  it's  sad  to  say 
That  it  always  ends  in  the  same  old  way . 
They  hug  and  kiss  and  say  "good-bye" 
And  then  they  all  begin  to  cry! 
I  say  that  60  percent  shed  their  tears 
But  my  mom  counted  35  people  this  year. 

She  said  that  boys  and  men  will  always  have  their  pride. 

So  we  won't  ever  see  them  cry. 

I  myself  don't  think  she's  correct. 
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So  could  you  please  try  this  problem  to  see  if  you  object? 

Exercise  9.16.19  (Solution  on  p.  425.) 

"The  Problem  with  Angels"  by  C3mdy  Dowling 

Although  this  problem  is  wholly  mine, 
The  catalyst  came  from  the  magazine,  Time. 
On  the  magazine  cover  I  did  find 
The  realm  of  angels  tickling  my  mind. 

Inside,  69°/,  I  found  to  be 
In  angels,  Americans  do  believe. 

Then,  it  was  time  to  rise  to  the  task. 
Ninety-five  high  school  and  college  students  I  did  ask. 
Viewing  all  as  one  group, 
Random  sampling  to  get  the  scoop. 

So,  I  asked  each  to  be  true, 
"Do  you  believe  in  angels?"    Tell  me,  do! 

Hypothesizing  at  the  start. 
Totally  believing  in  my  heart 
That  the  proportion  who  said  yes 
Would  be  equal  on  this  test . 

Lo  and  behold,  seventy-three  did  arrive. 
Out  of  the  sample  of  ninety-five. 
Now  your  job  has  just  begun. 
Solve  this  problem  and  have  some  fun. 

Exercise  9.16.20 

"Blowing  Bubbles"  by  Sondra  Prull 

Studying  stats  just  made  me  tense, 
I  had  to  find  some  sane  defense. 
Some  light  and  lifting  simple  play 
To  float  my  math  anxiety  away. 

Blowing  bubbles  lifts  me  high 
Takes  my  troubles  to  the  sky. 
POIK!  They're  gone,  with  all  my  stress 
Bubble  therapy  is  the  best . 

The  label  said  each  time  I  blew 
The  average  number  of  bubbles  would  be  at  least  22. 
I  blew  and  blew  and  this  I  found 
From  64  blows,  they  all  are  round! 
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But  the  number  of  bubbles  in  64  blows 
Varied  widely,  this  I  know. 
20  per  blow  became  the  mean 
They  deviated  by  6,  and  not  16. 

From  counting  bubbles,  I  sure  did  relax 
But  now  I  give  to  you  your  task. 
Was  11  a  reasonable  guess? 
Find  the  answer  and  pass  this  test! 

Exercise  9.16.21 

21.  "Dalmatian  Damation"  by  Kathy  Sparling 

A  greedy  dog  breeder  named  Spreckles 
Bred  puppies  with  numerous  freckles 
The  Dalmatians  he  sought 
Possessed  spot  upon  spot 

The  more  spots,  he  thought,  the  more  shekels 

His  competitors  did  not  agree 
That  freckles  would  increase  the  fee. 
They  said,   ''Spots  are  quite  nice 
But  they  don't  affect  price; 
One  should  breed  for  improved  pedigree.'' 

The  breeders  decided  to  prove 
This  strategy  was  a  wrong  move. 
Breeding  only  for  spots 
Would  wreak  havoc,  they  thought. 
His  theory  they  want  to  disprove. 

They  proposed  a  contest  to  Spreckles 
Comparing  dog  prices  to  freckles. 
In  records  they  looked  up 
One  hundred  one  pups : 

Dalmatians  that  fetched  the  most  shekels. 

They  asked  Mr.  Spreckles  to  name 
An  average  spot  count  he'd  claim 
To  bring  in  big  bucks . 
Said  Spreckles,   ''Well,  shucks, 
It's  for  one  hundred  one  that  I  aim.'' 

Said  an  amateur  statistician 
Who  wanted  to  help  with  this  mission. 
''Twenty-one  for  the  sample 
Standard  deviation's  ample: 
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They  examined  one  hundred  and  one 
Dalmatians  that  fetched  a  good  sum. 
They  counted  each  spot , 
Mark,  freckle  and  dot 
And  tallied  up  every  one. 

Instead  of  one  hundred  one  spots 
They  averaged  ninety  six  dots 
Can  they  muzzle  Spreckles' 
Obsession  with  freckles 
Based  on  all  the  dog  data  they've  got? 

Exercise  9.16.22 

"Macaroni  and  Cheese,  please!!"  by  Ned  da  Misherghi  and  Rachelle  Hall 

As  a  poor  starving  student  I  don't  have  much  money  to  spend  for  even  the  bare  necessities.  So 
my  favorite  and  main  staple  food  is  macaroni  and  cheese.  It's  high  in  taste  and  low  in  cost  and 
nutritional  value. 

One  day,  as  I  sat  down  to  determine  the  meaning  of  life,  I  got  a  serious  craving  for  this,  oh,  so 
important,  food  of  my  life.  So  I  went  down  the  street  to  Greatway  to  get  a  box  of  macaroni  and 
cheese,  but  it  was  SO  expensive!  $2.02  !!!  Can  you  believe  it?  It  made  me  stop  and  think.  The 
world  is  changing  fast.  I  had  thought  that  the  mean  cost  of  a  box  (the  normal  size,  not  some  super- 
gigantic-family-value-pack)  was  at  most  $1,  but  now  I  wasn't  so  sure.  However,  I  was  determined 
to  find  out.  I  went  to  53  of  the  closest  grocery  stores  and  surveyed  the  prices  of  macaroni  and 
cheese.  Here  are  the  data  I  wrote  in  my  notebook: 

Price  per  box  of  Mac  and  Cheese: 

•  5  stores  @  $2.02 

•  15  stores  @  $0.25 

•  3  stores  @  $1.29 

•  6  stores  @  $0.35 

•  4  stores  @  $2.27 

•  7  stores®  $1.50 

•  5  stores®  $1.89 

•  8  stores  ®  0.75. 

I  could  see  that  the  costs  varied  but  I  had  to  sit  down  to  figure  out  whether  or  not  I  was  right.  If 
it  does  turn  out  that  this  mouth-watering  dish  is  at  most  $1,  then  I'll  throw  a  big  cheesy  party  in 
our  next  statistics  lab,  with  enough  macaroni  and  cheese  for  just  me.  (After  all,  as  a  poor  starving 
student  I  can't  be  expected  to  feed  our  class  of  animals!) 

Exercise  9.16.23  (Solution  on  p.  426.) 

"William  Shakespeare:  The  Tragedy  of  Hamlet,  Prince  of  Denmark"  by  Jacqueline  Ghodsi 

THE  CHARACTERS  (in  order  of  appearance): 

•  HAMLET,  Prince  of  Denmark  and  student  of  Statistics 

•  POLONIUS,  Hamlet's  tutor 

•  HOROTIO,  friend  to  Hamlet  and  fellow  student 

Scene:  The  great  library  of  the  castle,  in  which  Hamlet  does  his  lessons 
Act  I 
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(The  day  is  fair,  but  the  face  of  Hamlet  is  clouded.  He  paces  the  large  room.  His  tutor,  Polonius,  is 
reprimanding  Hamlet  regarding  the  latter 's  recent  experience.  Horatio  is  seated  at  the  large  table 
at  right  stage.) 

POLONIUS:  My  Lord,  how  cans't  thou  admit  that  thou  hast  seen  a  ghost!  It  is  but  a  figment  of 
your  imagination! 

HAMLET:  I  beg  to  differ;  I  know  of  a  certainty  that  five-and-seventy  in  one  hundred  of  us,  con- 
demned to  the  whips  and  scorns  of  time  as  we  are,  have  gazed  upon  a  spirit  of  health,  or  goblin 
damn'd,  be  their  intents  wicked  or  charitable. 

POLONIUS  If  thou  doest  insist  upon  thy  wretched  vision  then  let  me  invest  your  time;  be  true 
to  thy  work  and  speak  to  me  through  the  reason  of  the  null  and  alternate  hypotheses.  (He  turns 
to  Horatio.)  Did  not  Hamlet  himself  say,  "What  piece  of  work  is  man,  how  noble  in  reason,  how 
infinite  in  faculties?  Then  let  not  this  foolishness  persist.  Go,  Horatio,  make  a  survey  of  three-and- 
sixty  and  discover  what  the  true  proportion  be.  For  my  part,  I  will  never  succumb  to  this  fantasy, 
but  deem  man  to  be  devoid  of  all  reason  shoiild  thy  proposal  of  at  least  five-and-seventy  in  one 
hundred  hold  true. 

HORATIO  (to  Hamlet):  What  should  we  do,  my  Lord? 
HAMLET:  Go  to  thy  purpose,  Horatio. 
HORATIO:  To  what  end,  my  Lord? 

HAMLET:  That  you  must  teach  me.  But  let  me  conjure  you  by  the  rights  of  our  fellowship,  by  the 
consonance  of  our  youth,  but  the  obligation  of  our  ever-preserved  love,  be  even  and  direct  with 
me,  whether  I  am  right  or  no. 

(Horatio  exits,  followed  by  Polonius,  leaving  Hamlet  to  ponder  alone.) 
Act  II 

(The  next  day,  Hamlet  awaits  anxiously  the  presence  of  his  friend,  Horatio.  Polonius  enters  and 
places  some  books  upon  the  table  just  a  moment  before  Horatio  enters.) 

POLONIUS:  So,  Horatio,  what  is  it  thou  didst  reveal  through  thy  deliberations? 

HORATIO:  In  a  random  survey,  for  which  purpose  thou  thyself  sent  me  forth,  I  did  discover  that 
one-and-forty  believe  fervently  that  the  spirits  of  the  dead  walk  with  us.  Before  my  God,  I  might 
not  this  believe,  without  the  sensible  and  true  avouch  of  mine  own  eyes. 

POLONIUS:  Give  thine  own  thoughts  no  tongue,  Horatio.  (Polonius  turns  to  Hamlet.)  But  look 
to't  I  charge  you,  my  Lord.  Come  Horatio,  let  us  go  together,  for  this  is  not  oui  test.  (Horatio  and 
Polonius  leave  together.) 

HAMLET:  To  reject,  or  not  reject,  that  is  the  question:  whether  'tis  nobler  in  the  mind  to  suffer  the 
slings  and  arrows  of  outrageous  statistics,  or  to  take  arms  against  a  sea  of  data,  and,  by  opposing, 
end  them.  (Hamlet  resignedly  attends  to  his  task.) 

(Curtain  falls) 

Exercise  9.16.24 

"Untitled"  by  Stephen  Chen 
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I've  often  wondered  how  software  is  released  and  sold  to  the  public.  Ironically,  I  work  for  a  com- 
pany that  sells  products  with  known  problems.  Unfortunately,  most  of  the  problems  are  difficult 
to  create,  which  makes  them  difficult  to  fix.  1  usually  use  the  test  program  X,  which  tests  the  prod- 
uct, to  try  to  create  a  specific  problem.  When  the  test  program  is  run  to  make  an  error  occur,  the 
likelihood  of  generating  an  error  is  1%. 

So,  armed  with  this  knowledge,  1  wrote  a  new  test  program  Y  that  will  generate  the  same  error  that 
test  program  X  creates,  but  more  often.  To  find  out  if  my  test  program  is  better  than  the  original, 
so  that  1  can  convince  the  management  that  I'm  right,  1  ran  my  test  program  to  find  out  how  often 
1  can  generate  the  same  error.  When  1  ran  my  test  program  50  times,  1  generated  the  error  twice. 
While  this  may  not  seem  much  better,  I  think  that  I  can  convince  the  management  to  use  my  test 
program  instead  of  the  original  test  program.  Am  I  right? 

Exercise  9.16.25  (Solution  on  p.  426.) 

Japanese  Girls'  Names 

by  Kumi  Furuichi 

It  used  to  be  very  t5^ical  for  Japanese  girls'  names  to  end  with  "ko."  (The  trend  might  have 
started  around  my  grandmothers'  generation  and  its  peak  might  have  been  around  my  mother's 
generation.)  "Ko"  means  "child"  in  Chinese  character.  Parents  would  name  their  daughters  with 
"ko"  attaching  to  other  Chinese  characters  which  have  meanings  that  they  want  their  daughters 
to  become,  such  as  Sachiko  -  a  happy  child,  Yoshiko  -  a  good  child,  Yasuko  -  a  healthy  child,  and 
so  on. 

However,  1  noticed  recently  that  only  two  out  of  nine  of  my  Japanese  girlfriends  at  this  school  have 
names  which  end  with  "ko."  More  and  more,  parents  seem  to  have  become  creative,  modernized, 
and,  sometimes,  westernized  in  naming  their  children. 

I  have  a  feeling  that,  while  70  percent  or  more  of  my  mother's  generation  would  have  names  with 
"ko"  at  the  end,  the  proportion  has  dropped  among  my  peers.  I  wrote  down  all  my  Japanese 

friends',  ex-classmates',  co-workers,  and  acquaintances'  names  that  1  could  remember.  Below  are 
the  names.  (Some  are  repeats.)  Test  to  see  if  the  proportion  has  dropped  for  this  generation. 

Ai,  Akemi,  Akiko,  Ayumi,  Chiaki,  Chie,  Eiko,  Eri,  Eriko,  Fumiko,  Harumi,  Hitomi,  Hiroko,  Hi- 
roko,  Hidemi,  Hisako,  Hinako,  Izumi,  Izumi,  Junko,  Junko,  Kana,  Kanako,  Kanayo,  Kayo,  Kayoko, 
Kazumi,  Keiko,  Keiko,  Kei,  Kumi,  Kumiko,  Kyoko,  Kyoko,  Madoka,  Maho,  Mai,  Maiko,  Maki, 
Miki,  Miki,  Mikiko,  Mina,  Minako,  Miyako,  Momoko,  Nana,  Naoko,  Naoko,  Naoko,  Noriko, 
Rieko,  Rika,  Rika,  Rumiko,  Rei,  Reiko,  Reiko,  Sachiko,  Sachiko,  Sachiyo,  Saki,  Sayaka,  Sayoko, 
Sayuri,  Seiko,  Shiho,  Shizuka,  Sumiko,  Takako,  Takako,  Tomoe,  Tomoe,  Tomoko,  Touko,  Yasuko, 
Yasuko,  Yasuyo,  Yoko,  Yoko,  Yoko,  Yoshiko,  Yoshiko,  Yoshiko,  Yuka,  Yuki,  Yuki,  Yukiko,  Yuko, 
Yuko. 

Exercise  9.16.26 

Phillip's  Wish  by  Suzanne  Osorio 

My  nephew  likes  to  play 
Chasing  the  girls  makes  his  day. 
He  asked  his  mother 
If  it  is  okay 
To  get  his  ear  pierced. 
She  said,   ''No  way!'' 
To  poke  a  hole  through  your  ear, 
Is  not  what  I  want  for  you,  dear. 
He  argued  his  point  quite  well. 
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Says  even  my  macho  pal,  Mel, 
Has  gotten  this  done. 

It's  all  just  for  fun. 

C'mon  please,  mom,  please,  what  the  hell. 

Again  Phillip  complained  to  his  mother. 

Saying  half  his  friends  (including  their  brothers) 

Are  piercing  their  ears 

And  they  have  no  fears 

He  wants  to  be  like  the  others. 

She  said,   ''I  think  it's  much  less. 

We  must  do  a  hypothesis  test . 

And  if  you  are  right , 

I  won ' t  put  up  a  f  ight . 

But ,  if  not ,  then  my  case  will  rest . ' ' 

We  proceeded  to  call  fifty  guys 

To  see  whose  prediction  would  fly. 

Nineteen  of  the  fifty 

Said  piercing  was  nifty 

And  earrings  they'd  occasionally  buy. 

Then  there's  the  other  thirty-one. 

Who  said  they'd  never  have  this  done. 

So  now  this  poem's  finished. 

Will  his  hopes  be  diminished. 

Or  will  my  nephew  have  his  fun? 

Exercise  9.16.27  (Solution  on  p.  426.) 

The  Craven  by  Mark  Salangsang 

Once  upon  a  morning  dreary 
In  stats  class  I  was  weak  and  weary. 
Pondering  over  last  night ' s  homework 
Whose  answers  were  now  on  the  board 
This  I  did  and  nothing  more. 

While  I  nodded  nearly  napping 
Suddenly,  there  came  a  tapping. 
As  someone  gently  rapping. 
Rapping  my  head  as  I  snore . 
Quoth  the  teacher,   ''Sleep  no  more.'' 

''In  every  class  you  fall  asleep,'' 
The  teacher  said,  his  voice  was  deep. 
''So  a  tally  I've  begun  to  keep 
Of  every  class  you  nap  and  snore. 
The  percentage  being  forty-four.'' 

''My  dear  teacher  I  must  confess. 
While  sleeping  is  what  I  do  best . 
The  percentage,  I  think,  must  be  less, 
A  percentage  less  than  forty-four.'' 
This  I  said  and  nothing  more. 
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''We'll  see,''  he  said  and  walked  away, 
And  fifty  classes  from  that  day 
He  counted  till  the  month  of  May 
The  classes  in  which  I  napped  and  snored. 
The  number  he  found  was  twenty-four. 

At  a  significance  level  of  0.05, 
Please  tell  me  am  I  still  alive? 
Or  did  my  grade  just  take  a  dive 
Plunging  down  beneath  the  floor? 
Upon  thee  I  hereby  implore . 

Exercise  9.16.28 

Toastmasters  International  cites  a  report  by  Gallop  Poll  that  40%  of  Americans  fear  public 
speaking.  A  student  believes  that  less  than  40%  of  students  at  her  school  fear  public  speaking. 
She  randomly  surveys  361  schoolmates  and  finds  that  135  report  they  fear  public  speaking. 
Conduct  a  hypothesis  test  to  determine  if  the  percent  at  her  school  is  less  than  40%.  {Source: 
http:/ / toastmasters.org/  artisan/  detailasp  ?CategoryID=l  &SubCategoryID=l  0&ArticleID=429&Page=l  ^'^ 
) 

Exercise  9.16.29  (Solution  on  p.  426.) 

68%  of  online  courses  taught  at  community  colleges  nationwide  were  taught  by  full-time  faculty. 
To  test  if  68%  also  represents  California's  percent  for  full-time  faculty  teaching  the  online  classes. 
Long  Beach  City  College  (LBCC),  CA,  was  randomly  selected  for  comparison.  In  the  same  year,  34 
of  the  44  online  courses  LBCC  offered  were  taught  by  full-time  faculty.  Conduct  a  h5^othesis  test 
to  determine  if  68%  represents  CA.  NOTE:  For  more  accurate  results,  use  more  CA  community 
colleges  and  this  past  year's  data.  (Sources:  Growing  by  Degrees  by  Allen  and  Seaman;  Amit 
Schitai,  Director  of  Instructional  Technology  and  Distance  Learning,  LBCC). 

Exercise  9.16.30 

According  to  an  article  in  Bloomberg  Businessweek,  New  York  City's  most  recent  adult  smoking 
rate  is  14%.  Suppose  that  a  survey  is  conducted  to  determine  this  year's  rate.  Nine  out  of  70  ran- 
domly chosen  N.Y.  City  residents  reply  that  they  smoke.  Conduct  a  hypothesis  test  to  determine  if 
the  rate  is  still  14%  or  if  it  has  decreased.  {Source:  http://www.busmessweek.com/news/2011- 
09-15/nyc-smoking-rate-lalls-to-record-low-ol-14-bloomberg-says.htinl^^ ) 

Exercise  9.16.31  (Solution  on  p.  426.) 

The  mean  age  of  De  Anza  College  students  in  a  previous  term  was  26.6  years  old.  An  instructor 
thinks  the  mean  age  for  online  students  is  older  than  26.6.  She  randomly  surveys  56  online  stu- 
dents and  finds  that  the  sample  mean  is  29.4  with  a  standard  deviation  of  2.1.  Conduct  a  hypoth- 
esis test.  {Source:  http:/ /research.{hda.edu/Iactbook/DAdemoIs/Fact_sheet_da_2006w.pd{^^  ) 

Exercise  9.16.32 

Registered  nurses  earned  an  average  annual  salary  of  $69,110.  For  that  same  year,  a  survey 
was  conducted  of  41  California  registered  nurses  to  determine  if  the  annual  salary  is  higher  than 
$69,110  for  California  nurses.  The  sample  average  was  $71,121  with  a  sample  standard  deviation  of 
$7,489.  Conduct  a  hypothesis  test.  {Source:  http: / /www.bls.gov /oes/current/oes291111.htm^^ 
) 

Exercise  9.16.33  (Solution  on  p.  426.) 

La  Leche  League  International  reports  that  the  mean  age  of  weaning  a  child  from  breastfeeding 
is  age  4  to  5  worldwide.  In  America,  most  nursing  mothers  wean  their  children  much  earlier. 

^''http://toastmasters.org/artisan/detail.asp?CategoryID=l&SubCategoryID=10&ArticleID=429&Page=l 
^*http://www.busmessweek.com/news/2011-09-15/nyc-smoking-rate-falls-to-record-low-of-14-bloomberg-says.htir^ 
^'http://research.fhda.edu/factbook/DAdemofs  /  Fact_sheet_da_2006w.pdf 
^'^http:/ /www.bls.gov/oes/current/oes291111.htm 
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Suppose  a  random  survey  is  conducted  of  21  U.S.  mothers  who  recently  weaned  their  children. 
The  mean  weaning  age  was  9  months  (3/4  year)  with  a  standard  deviation  of  4  months.  Conduct 
a  hypothesis  test  to  determine  if  the  mean  weaning  age  in  the  U.S.  is  less  than  4  years  old.  (Source; 
http:/ / www.lalecheleague.org/Law/BAFebOl.html'^^ ) 


9.16.1  Try  these  multiple  choice  questions. 

Exercise  9.16.34  (Solution  on  p.  426.) 

When  a  new  drug  is  created,  the  pharmaceutical  company  must  subject  it  to  testing  before  receiv- 
ing the  necessary  permission  from  the  Food  and  Drug  Administration  (FDA)  to  market  the  drug. 
Suppose  the  null  hj^othesis  is  "the  drug  is  unsafe."  VVhat  is  the  Type  II  Error? 

A.  To  conclude  the  drug  is  safe  when  in,  fact,  it  is  unsafe 

B.  To  not  conclude  the  drug  is  safe  when,  in  fact,  it  is  safe. 

C.  To  conclude  the  drug  is  safe  when,  in  fact,  it  is  safe. 

D.  To  not  conclude  the  drug  is  unsafe  when,  in  fact,  it  is  unsafe 

The  next  two  questions  refer  to  the  following  information:  Over  the  past  few  decades,  public  health 
officials  have  examined  the  link  between  weight  concerns  and  teen  girls  smoking.  Researchers  surveyed  a 
group  of  273  randomly  selected  teen  girls  living  in  Massachusetts  (between  12  and  15  years  old).  After  four 
years  the  girls  were  surveyed  again.  Sixty-three  (63)  said  they  smoked  to  stay  thin.  Is  there  good  evidence 
that  more  than  thirty  percent  of  the  teen  girls  smoke  to  stay  thin? 

Exercise  9.16.35  (Solution  on  p.  426.) 

The  alternate  hypothesis  is 

A.  p  <  0.30 

B.  p  <  0.30 

C.  p  >  0.30 

D.  p  >  0.30 

Exercise  9.16.36  (Solution  on  p.  426.) 

After  conducting  the  test,  your  decision  and  conclusion  are 

A.  Reject  Hp:  There  is  sufficient  evidence  to  conclude  that  more  than  30%  of  teen  girls  smoke  to 

stay  thin. 

B.  Do  not  reject  Hq:  There  is  not  sufficient  evidence  to  conclude  that  less  than  30%  of  teen  girls 

smoke  to  stay  thin. 

C.  Do  not  reject  Hg:  There  is  not  sufficient  evidence  to  conclude  that  more  than  30%  of  teen  girls 

smoke  to  stay  thin. 

D.  Reject  Hq:  There  is  sufficient  evidence  to  conclude  that  less  than  30%  of  teen  girls  smoke  to 

stay  thin. 

The  next  three  questions  refer  to  the  following  information:  A  statistics  instructor  believes  that  fewer 
than  20%  of  Evergreen  Valley  College  (EVC)  students  attended  the  opening  night  midnight  showing  of 
the  latest  Harry  Potter  movie.  She  surveys  84  of  her  students  and  finds  that  11  of  attended  the  midnight 
showing. 

Exercise  9.16.37  (Solution  on  p.  427.) 

An  appropriate  alternative  hj^othesis  is 

http://www.lalecheleague.org/Law/BAFeb01.htinl 
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A.  p  =  0.20 

B.  p  >  0.20 

C.  p  <  0.20 

D.  p  <  0.20 


Exercise  9.16.38 


(Solution  on  p.  427.) 


At  a  1%  level  of  significance,  an  appropriate  conclusion  is: 

A.  There  is  insufficient  evidence  to  conclude  that  the  percent  of  EVC  students  that  attended  the 

midnight  showing  of  Harry  Potter  is  less  than  20%. 

B.  There  is  sufficient  evidence  to  conclude  that  the  percent  of  EVC  students  that  attended  the 

midnight  showing  of  Harry  Potter  is  more  than  20%. 

C.  There  is  sufficient  evidence  to  conclude  that  the  percent  of  EVC  students  that  attended  the 

midnight  showing  of  Harry  Potter  is  less  than  20%. 

D.  There  is  insufficient  evidence  to  conclude  that  the  percent  of  EVC  students  that  attended  the 

midnight  showing  of  Harry  Potter  is  at  least  20%. 

Exercise  9.16.39  (Solution  on  p.  427.) 

The  Type  I  error  is  to  conclude  that  the  percent  of  EVC  students  who  attended  is 

A.  at  least  20%,  when  in  fact,  it  is  less  than  20%. 

B.  20%,  when  in  fact,  it  is  20%. 

C.  less  than  20%,  when  in  fact,  it  is  at  least  20%. 

D.  less  than  20%,  when  in  fact,  it  is  less  than  20%. 

The  next  two  questions  refer  to  the  following  information: 

It  is  believed  that  Lake  Tahoe  Commimity  College  (LTCC)  Intermediate  Algebra  students  get  less  than  7 
hours  of  sleep  per  night,  on  average.  A  survey  of  22  LTCC  Intermediate  Algebra  students  generated  a 
mean  of  7.24  hours  with  a  standard  deviation  of  1.93  hours.  At  a  level  of  significance  of  5%,  do  LTCC 
Intermediate  Algebra  students  get  less  than  7  hours  of  sleep  per  night,  on  average? 

Exercise  9.16.40  (Solution  on  p.  427.) 

The  distribution  to  be  used  for  this  test  is  X  ~ 


The  Type  II  error  is  to  not  reject  that  the  mean  niraiber  of  hours  of  sleep  LTCC  students  get  per 
night  is  at  least  7  when,  in  fact,  the  mean  number  of  hours 

A.  is  more  than  7  hoiurs. 

B.  is  at  most  7  hours. 

C.  is  at  least  7  hours. 

D.  is  less  than  7  hoiu-s. 

The  next  three  questions  refer  to  the  following  information:  Previously,  an  organization  reported  that 
teenagers  spent  4.5  hours  per  week,  on  average,  on  the  phone.  The  organization  thinks  that,  currently, 
the  mean  is  higher.  Fifteen  (15)  randomly  chosen  teenagers  were  asked  how  many  hours  per  week  they 


A.  N  (7.24,  If) 


B.  N  (7.24, 1.93) 

C.  t22 

D.  t21 


Exercise  9.16.41 


(Solution  on  p.  427.) 
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spend  on  the  phone.  The  sample  mean  was  4.75  hours  with  a  sample  standard  deviation  of  2.0.  Conduct  a 
hypothesis  test. 

Exercise  9.16.42  (Solution  on  p.  427.) 

The  niill  and  alternate  hj^otheses  are: 

A.  Ho  -.x  =  4.5,  Ha  -.x  >  4.5 

B.  Ho  :  }i  >  i.5  Ha  :  }i  <  4.5 

C.  Ho:}i  =  4.75  Ha.ji  >  i.75 

D.  Ho  :  ^  =  4.5  Ha  :  ^  >  4.5 

Exercise  9.16.43  (Solution  on  p.  427.) 

At  a  significance  level  of  a  =  0.05,  what  is  the  correct  conclusion? 

A.  There  is  enough  evidence  to  conclude  that  the  mean  number  of  hours  is  more  than  4.75 

B.  There  is  enough  evidence  to  conclude  that  the  mean  number  of  hours  is  more  than  4.5 

C.  There  is  not  enough  evidence  to  conclude  that  the  mean  number  of  hours  is  more  than  4.5 

D.  There  is  not  enough  evidence  to  conclude  that  the  mean  number  of  hours  is  more  than  4.75 


Exercise  9.16.44  (Solution  on  p.  427.) 

The  Type  I  error  is: 

A.  To  conclude  that  the  current  mean  hours  per  week  is  higher  than  4.5,  when  in  fact,  it  is  higher. 

B.  To  conclude  that  the  current  mean  hours  per  week  is  higher  than  4.5,  when  in  fact,  it  is  the 

same. 

C.  To  conclude  that  the  mean  hours  per  week  currently  is  4.5,  when  in  fact,  it  is  higher. 

D.  To  conclude  that  the  mean  hours  per  week  currently  is  no  higher  than  4.5,  when  in  fact,  it  is 

not  higher. 
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9.17  Review'' 


Exercise  9.17.1  (Solution  on  p.  427.) 

Rebecca  and  Matt  are  14  year  old  twins.  Matt's  height  is  2  standard  deviations  below  the  mean 
for  14  year  old  boys'  height.  Rebecca's  height  is  0.10  standard  deviations  above  the  mean  for  14 
year  old  girls'  height.  Interpret  this. 

A.  Matt  is  2.1  inches  shorter  than  Rebecca 

B.  Rebecca  is  very  tall  compared  to  other  14  year  old  girls. 

C.  Rebecca  is  taller  than  Matt. 

D.  Matt  is  shorter  than  the  average  14  year  old  boy. 

Exercise  9.17.2  (Solution  on  p.  427.) 

Construct  a  histogram  of  the  IPO  data  (see  Table  of  Contents,  14.  Appendix,  Data  Sets).  Use  5 

intervals. 

The  next  three  exercises  refer  to  the  following  information:  Ninety  homeowners  were  asked  the  number 
of  estimates  they  obtained  before  having  their  homes  fumigated.  X  =  the  number  of  estimates. 


X 

Rel.  Freq. 

Cumulative  Rel.  Freq. 

1 

0.3 

2 

0.2 

4 

0.4 

5 

0.1 

Table  9.5 


Complete  the  cumulative  relative  frequency  column. 

Exercise  9.17.3  (Solution  on  p.  427.) 

Calculate  the  sample  mean  (a),  the  sample  standard  deviation  (b)  and  the  percent  of  the  estimates 
that  fall  at  or  below  4  (c). 

Exercise  9.17.4  (Solution  on  p.  427.) 

Calculate  the  median,  M,  the  first  quartile,  Ql,  the  third  quartile,  Q3.  Then  construct  a  boxplot  of 

the  data. 

Exercise  9.17.5  (Solution  on  p.  427.) 

The  middle  50%  of  the  data  are  between  and  . 

The  next  three  questions  refer  to  the  following  table:  Seventy  5th  and  6th  graders  were  asked  their  favorite 
dinner. 


Pizza 

Hamburgers 

Spaghetti 

Fried  shrimp 

5th  grader 

15 

6 

9 

0 

6th  grader 

15 

7 

10 

8 

Table  9.6 


Exercise  9.17.6  (Solution  on  p.  427.) 

Find  the  probability  that  one  randomly  chosen  child  is  in  the  6th  grade  and  prefers  fried  shrimp. 


^This  content  is  available  online  at  <http: / /cnx.org/content/ml7013/1.12/>. 
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Exercise  9.17.7  (Solution  on  p.  427.) 

Find  the  probability  that  a  child  does  not  prefer  pizza. 


A  30 
R  30 

r  40 

70 

D.  1 


Exercise  9.17.8  (Solution  on  p.  427.) 

Find  the  probability  a  child  is  in  the  5th  grade  given  that  the  child  prefers  spaghetti. 

A  ^ 
B.  ^ 

C  ^ 

30 

D  1? 

L».  70 

Exercise  9.17.9  (Solution  on  p.  427.) 

A  sample  of  convenience  is  a  random  sample. 

A.  true 

B.  false 

Exercise  9.17.10  (Solution  on  p.  427.) 

A  statistic  is  a  number  that  is  a  property  of  the  population. 

A.  true 

B.  false 

Exercise  9.17.11  (Solution  on  p.  427.) 

You  should  always  throw  out  any  data  that  are  outliers. 

A.  true 

B.  false 

Exercise  9.17.12  (Solution  on  p.  427.) 

Lee  bakes  pies  for  a  small  restaurant  in  Felton,  CA.  She  generally  bakes  20  pies  in  a  day,  on  the 
average.  Of  interest  is  the  num.ber  of  pies  she  bakes  each  day 

a.  Define  the  Random  Variable  X. 

b.  State  the  distribution  for  X. 

c.  Find  the  probability  that  Lee  bakes  more  than  25  pies  in  any  given  day. 

Exercise  9.17.13  (Solution  on  p.  428.) 

Six  different  brands  of  Italian  salad  dressing  were  randomly  selected  at  a  supermarket.  The  grams 
of  fat  per  serving  are  7,  7, 9, 6, 8, 5.  Assume  that  the  underlying  distribution  is  normal.  Calculate  a 
95%  confidence  interval  for  the  population  mean  grams  of  fat  per  serving  of  Italian  salad  dressing 
sold  in  supermarkets. 

Exercise  9.17.14  (Solution  on  p.  428.) 

Given:  uniform,  exponential,  normal  distributions.  Match  each  to  a  statement  below. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


418 

a.  mean  =  median  ^  mode 

b.  mean  >  median  >  mode 

c.  mean  =  median  =  mode 


CHAPTER  9.  HYPOTHESIS  TESTING:  SINGLE  MEAN  AND  SINGLE 

PROPORTION 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


419 


9.18  Lab:  Hypothesis  Testing  of  a  Single  Mean  and  Single  Proportion 

Class  Time: 
Names: 

9.18.1  Student  Learning  Outcomes: 

•  The  student  will  select  the  appropriate  distributions  to  use  in  each  case. 

•  The  student  will  conduct  h5^othesis  tests  and  interpret  the  results. 


9.18.2  Television  Survey 

In  a  recent  survey,  it  was  stated  that  Americans  watch  television  on  average  four  hours  per  day.  Assume 
that  a  —  1.  Using  your  class  as  the  sample,  conduct  a  hypothesis  test  to  determine  if  the  average  for 
students  at  yoiur  school  is  lower. 

1.  Ho-. 

2.  H„: 

3.  In  words,  define  the  random  variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Determine  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.Shade  the  actual  level  of  significance, 
a.  Graph: 

^This  content  is  available  onKne  at  <http://cnx.org/content/ml7007/1.12/>. 
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Figure  9.9 

b.  Determine  the  p-value: 

7.  Do  you  or  do  you  not  reject  the  null  h5^othesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 

9.18.3  Language  Survey 

About  42.3%  of  Californians  and  19.6%  of  all  Americans  over  age  5  speak  a  language  other  than  English 
at  home.  Using  your  class  as  the  sample,  conduct  a  hypothesis  test  to  determine  if  the  percent  of  the 
students  at  your  school  that  speak  a  language  other  than  English  at  home  is  different  from  42.3%.  (Source: 
http://www.census.gov/hhes/socdemo/language/  ) 

1.  Ho-. 

2.  Ha-. 

3.  In  words,  define  the  random  variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Determine  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance. 

^*http:/ / cnx.org/ content/ ml7007/latest/  http://www.census.gov/hhes/socdemo/language/ 

Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


a.  Graph: 


421 


Figure  9.10 

b.  Determine  the  p-value: 

7.  Do  you  or  do  you  not  reject  the  null  hypothesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 

9.18.4  Jeans  Survey 

Suppose  that  young  adults  own  an  average  of  3  pairs  of  jeans.  Survey  8  people  from  your  class  to  determine 
if  the  average  is  higher  than  3. 

1.  Ho-. 

2.  H„: 

3.  In  words,  define  the  random  variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Determine  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance, 
a.  Graph: 

Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


422 


CHAPTER  9.  HYPOTHESIS  TESTING:  SINGLE  MEAN  AND  SINGLE 

PROPORTION 


Figure  9.11 

b.  Determine  the  p-value: 

7.  Do  you  or  do  you  not  reject  the  null  h5^othesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 
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Solutions  to  Exercises  in  Chapter  9 

Solutions  to  Practice  1:  Single  Mean,  Known  Population  Standard  Deviation 

Solution  to  Exercise  9.13.1  (p.  397) 

Means 

Solution  to  Exercise  9.13.2  (p.  397) 

a:  Ho  :  ^  =  2.5  (or,  Hg  :  ji  <  2.5) 
b:  Ha:}i>  2.5 

Solution  to  Exercise  9.13.3  (p.  397) 

right-tailed 

Solution  to  Exercise  9.13.4  (p.  397) 

X 

Solution  to  Exercise  9.13.5  (p.  397) 

The  mean  time  spent  in  jail  for  26  first  time  convicted  burglars 
Solution  to  Exercise  9.13.6  (p.  397) 

Yes,  1.5 

Solution  to  Exercise  9.13.7  (p.  397) 

a.  3 

b.  1.5 

c.  1.8 

d.  26 

Solution  to  Exercise  9.13.8  (p.  397) 

a 

Solution  to  Exercise  9.13.9  (p.  397) 

(2-5'^) 
Solution  to  Exercise  9.13.11  (p.  398) 

0.0446 

Solution  to  Exercise  9.13.12  (p.  398) 

a.  Reject  the  niill  hypothesis 

Solutions  to  Practice  2:  Single  Mean,  Unknown  Population  Standard  Deviation 

Solution  to  Exercise  9.14.1  (p.  399) 

averages 

Solution  to  Exercise  9.14.2  (p.  399) 

a.  Ho  :  ^  =  15 

b.  Ha  :  ^  7^  15 

Solution  to  Exercise  9.14.3  (p.  399) 

two-tailed 

Solution  to  Exercise  9.14.4  (p.  399) 

X 

Solution  to  Exercise  9.14.5  (p.  399) 

the  mean  time  spent  on  death  row  for  the  75  inmates 

Solution  to  Exercise  9.14.6  (p.  399) 

No 

Solution  to  Exercise  9.14.7  (p.  399) 
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a.  17.4 

b.  s 

c.  75 

Solution  to  Exercise  9.14.8  (p.  399) 

i-test 

Solution  to  Exercise  9.14.9  (p.  399) 

hi 

Solution  to  Exercise  9.14.11  (p.  400) 
0.0015 

Solution  to  Exercise  9.14.12  (p.  400) 

a.  Reject  the  niill  h}^othesis 

Solutions  to  Practice  3:  Single  Proportion 

Solution  to  Exercise  9.15.1  (p.  401) 

Proportions 

Solution  to  Exercise  9.15.2  (p.  401) 

a.  Ho  :  p  =  0.095 

h.  Ha:p<  0.095 

Solution  to  Exercise  9.15.3  (p.  401) 

left-tailed 

Solution  to  Exercise  9.15.4  (p.  401) 

P' 

Solution  to  Exercise  9.15.5  (p.  401) 

the  proportion  of  people  in  that  town  surveyed  suffering  from  depression  or  a  depressive  illness 
Solution  to  Exercise  9.15.6  (p.  401) 

a.  7 

b.  100 

c.  0.07 

Solution  to  Exercise  9.15.7  (p.  401) 
0.0293 

Solution  to  Exercise  9.15.8  (p.  401) 

Normal 

Solution  to  Exercise  9.15.10  (p.  402) 

0.1969 

Solution  to  Exercise  9.15.11  (p.  402) 
a.  Do  not  reject  the  null  hypothesis 

Solutions  to  Homework 
Solution  to  Exercise  9.16.1  (p.  403) 

a.  Ho  :    =  34 ;  Hfl  :  f(  7^  34 

c.  Ho:fi>  100,000  ■,Ha.}i<  100,000 

d.  Ho  :  p  =  0.29 ;  Ha  :  p  ^  0.29 
g.  Ho  :  p  =  0.50  ■,Ha:p^  0.50 

i.  Ho  :  p  >  0.11  ■,Ha:p<  0.11 
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Solution  to  Exercise  9.16.2  (p.  403) 

a.  Type  I  error:  We  conclude  that  the  mean  is  not  34  years,  when  it  really  is  34  years.  Type  II  error:  We  do 
not  conclude  that  the  mean  is  not  34  years,  when  it  is  not  really  34  years. 

c.  Type  I  error:  We  conclude  that  the  mean  is  less  than  $100,000,  when  it  really  is  at  least  $100,000.  Type  II 

error:  We  do  not  conclude  that  the  mean  is  less  than  $100,000,  when  it  is  really  less  than  $100,000. 

d.  Type  I  error:  We  conclude  that  the  proportion  of  h.s.  seniors  who  get  drunk  each  month  is  not  29%, 

when  it  really  is  29%.  Type  II  error:  We  do  not  conclude  that  the  proportion  of  h.s.  seniors  that  get 
dnmk  each  month  is  not  29%,  when  it  is  really  not  29%. 
i.  Type  I  error:  We  conclude  that  the  proportion  is  less  than  11%,  when  it  is  really  at  least  11%.  Type  II  error: 
We  do  not  conclude  that  the  proportion  is  less  than  11%,  when  it  really  is  less  than  11%. 

Solution  to  Exercise  9.16.5  (p.  403) 

e.  z  =  -2.71 

f.  0.0034 

h.  Decision:  Reject  null;  Conclusion:  ji  <19 

i.  (17.449,18.757) 

Solution  to  Exercise  9.16.7  (p.  404) 

e.  3.5 

f.  0.0005 

h.  Decision:  Reject  niill;  Conclusion:  }i  >  4.5 

i.  (4.7553,5.4447) 

Solution  to  Exercise  9.16.9  (p.  404) 

e.  2.7 

f.  0.0042 

h.  Decision:  Reject  Null 

i.  (80.789,85.211) 

Solution  to  Exercise  9.16.11  (p.  404) 

d.  tn 

e.  1.96 

f.  0.0380 

h.  Decision:  Reject  nuU  when  a  —  0.05  ;  do  not  reject  niiU  when  a  —  0.01 

i.  (3.8865,5.9468) 

Solution  to  Exercise  9.16.13  (p.  404) 

e.  -1.64 

f.  0.1000 

h.  Decision:  Do  not  reject  null 

i.  (0.3216,0.4784) 

Solution  to  Exercise  9.16.15  (p.  405) 
d. 

e.  -1.33 

f.  0.1086 

h.  Decision:  Do  not  reject  null 

i.  (51.886,62.114) 

Solution  to  Exercise  9.16.19  (p.  406) 
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e.  1.65 

f.  0.0984 

h.  Decision:  Do  not  reject  null 

i.  (0.6836,0.8533) 

Solution  to  Exercise  9.16.21  (p.  407) 

e.  -2.39 

f.  0.0093 

h.  Decision:  Reject  null 

i.  (91.854,100.15) 

Solution  to  Exercise  9.16.23  (p.  408) 

e.  -1.82 

f.  0.0345 

h.  Dedsion:  Do  not  reject  nuU 

i.  (0.5331,0.7685) 

Solution  to  Exercise  9.16.25  (p.  410) 

e.  z  =  -2.99 

f.  0.0014 

h.  Decision:  Reject  null;  Conclusion:  p  <  .70 

i.  (0.4529,0.6582) 

Solution  to  Exercise  9.16.27  (p.  411) 

e.  0.57 

f.  0.7156 

h.  Decision:  Do  not  reject  null 

i.  (0.3415,0.6185) 

Solution  to  Exercise  9.16.29  (p.  412) 

e.  1.32 

f.  0.1873 

h.  Decision:  Do  not  reject  null 

i.  (0.65,0.90) 

Solution  to  Exercise  9.16.31  (p.  412) 

e.  9.98 

f.  0.0000 

h.  Decision:  Reject  null 

i.  (28.8,30.0) 

Solution  to  Exercise  9.16.33  (p.  412) 

e.  -44.7 

f.  0.0000 

h.  Decision:  Reject  nuU 

i.  (0.60,0.90)  -  in  years 

Solution  to  Exercise  9.16.34  (p.  413) 
B 

Solution  to  Exercise  9.16.35  (p.  413) 
D 
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Solution  to 

Exercise  9.16.36 

(p. 

413) 

C 

Solution  to 

Exercise  9.16.37 

(p. 

413) 

C 

Solution  to 

Exercise  9.16.38 

(p. 

414) 

A 

Solution  to 

Exercise  9.16.39 

(p. 

414) 

C 

Solution  to 

exercise  y.io.^iu 

tp. 

'i.X.'i.) 

D 

Solution  to 

Exercise  9.16.41 

(P- 

414) 

D 

Solution  to 

Exercise  9.16.42 

(P- 

415) 

D 

Solution  to 

Exercise  9.16.43 

(P- 

415) 

C 

Solution  to 

Exercise  9.16.44 

(P- 

415) 

B 

Solutions  to  Review 

Solution  to  Exercise  9.17.1  (p.  416) 
D 

Solution  to  Exercise  9.17.2  (p.  416) 

No  solution  provided.  There  are  several  ways  in  which  the  histogram  could  be  constructed. 
Solution  to  Exercise  9.17.3  (p.  416) 

a.  2.8 

b.  1.48 

c.  90% 

Solution  to  Exercise  9.17.4  (p.  416) 
M  =  3  ;  Ql  =  1 ;  Q3  =  4 
Solution  to  Exercise  9.17.5  (p.  416) 

1  and  4 

Solution  to  Exercise  9.17.6  (p.  416) 

D 

Solution  to  Exercise  9.17.7  (p.  417) 

C 

Solution  to  Exercise  9.17.8  (p.  417) 

A 

Solution  to  Exercise  9.17.9  (p.  417) 
B 

Solution  to  Exercise  9.17.10  (p.  417) 

B 

Solution  to  Exercise  9.17.11  (p.  417) 

B 

Solution  to  Exercise  9.17.12  (p.  417) 

b.  P  (20) 

c.  0.1122 
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Solution  to  Exercise  9.17.13  (p.  417) 

CI:  (5.52,8.48) 

Solution  to  Exercise  9.17.14  (p.  417) 

a.  uniform 

b.  exponential 

c.  normal 
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Chapter  10 


Hypothesis  Testing:  Two  Means,  Paired 
Data,  Two  Proportions 

10.1  Hypothesis  Testing:  Two  Population  Means  and  Two  Population 
Proportions^ 

10.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Classify  h5^othesis  tests  by  type. 

•  Conduct  and  interpret  hypothesis  tests  for  two  popiilation  means,  popiilation  standard  deviations 
known. 

•  Conduct  and  interpret  hypothesis  tests  for  two  popiilation  means,  population  standard  deviations 
ujnJknown. 

•  Conduct  and  interpret  hypothesis  tests  for  two  population  proportions. 

•  Conduct  and  interpret  hypothesis  tests  for  matched  or  paired  samples. 


10.1.2  Introduction 

Studies  often  compare  two  groups.  For  example,  researchers  are  interested  in  the  effect  aspirin  has  in 
preventing  heart  attacks.  Over  the  last  few  years,  newspapers  and  magazines  have  reported  about  various 
aspirin  studies  involving  two  groups.  Typically,  one  group  is  given  aspirin  and  the  other  group  is  given  a 
placebo.  Then,  the  heart  attack  rate  is  studied  over  several  years. 

There  are  other  situations  that  deal  with  the  comparison  of  two  groups.  For  example,  studies  compare  var- 
ious diet  and  exercise  programs.  Politicians  compare  the  proportion  of  individuals  from  different  income 
brackets  who  might  vote  for  them.  Students  are  interested  in  whether  SAT  or  GRE  preparatory  coiurses 
really  help  raise  their  scores. 

In  the  previous  chapter,  you  learned  to  conduct  hypothesis  tests  on  single  means  and  single  proportions. 
You  wiU  expand  upon  that  in  this  chapter.  You  will  compare  two  means  or  two  proportions  to  each  other. 
The  general  procediire  is  stiU  the  same,  just  expanded. 


^This  content  is  available  online  at  <http://cnx.org/content/ml7029/1.9/>. 
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To  compare  two  means  or  two  proportions,  you  work  with  two  groups.  The  groups  are  classified  either  as 
independent  or  matched  pairs.  Independent  groups  mean  that  the  two  samples  taken  are  independent, 
that  is,  sample  values  selected  from  one  population  are  not  related  in  any  way  to  sample  values  selected 
from  the  other  population.  Matched  pairs  consist  of  two  samples  that  are  dependent.  The  parameter  tested 
using  matched  pairs  is  the  population  mean.  The  parameters  tested  using  independent  groups  are  either 
popxilation  means  or  population  proportions. 

NOTE:  This  chapter  relies  on  either  a  calculator  or  a  computer  to  calculate  the  degrees  of  freedom, 
the  test  statistics,  and  p-values.  T1-83+  and  Tl-84  instructions  are  included  as  well  as  the  test  statis- 
tic formulas.  When  using  the  T1-83+/T1-84  calculators,  we  do  not  need  to  separate  two  population 
means,  independent  groups,  population  variances  unknown  into  large  and  small  sample  sizes. 
However,  most  statistical  computer  software  has  the  ability  to  differentiate  these  tests. 

This  chapter  deals  with  the  following  hypothesis  tests: 
Independent  groups  (samples  are  independent) 

•  Test  of  two  population  means. 

•  Test  of  two  population  proportions. 

Matched  or  paired  samples  (samples  are  dependent) 

•  Becomes  a  test  of  one  population  mean. 


10.2  Comparing  Two  Independent  Population  Means  with  Unknown 
Population  Standard  Deviations^ 


1.  The  two  independent  samples  are  simple  random  samples  from  two  distinct  populations. 

2.  Both  populations  are  normally  distributed  with  the  population  means  and  standard  deviations  un- 
known unless  the  sample  sizes  are  greater  than  30.  In  that  case,  the  populations  need  not  be  normally 
distributed. 

NOTE:  The  test  comparing  two  independent  population  means  with  unknown  and  possibly  un- 
equal population  standard  deviations  is  called  the  Aspin-Welch  t-test.  The  degrees  of  freedom 
formula  was  developed  by  Aspin-Welch. 

The  comparison  of  two  population  means  is  very  common.  A  difference  between  the  two  samples  depends 
on  both  the  means  and  the  standard  deviations.  Very  different  means  can  occur  by  chance  if  there  is  great 
variation  among  the  individual  samples.  In  order  to  account  for  the  variation,  we  take  the  difference  of 
the  sample  means,  Xi  -  X2  ,  and  divide  by  the  standard  error  (shown  below)  in  order  to  standardize  the 
difference.  The  result  is  a  t-score  test  statistic  (shown  below). 

Because  we  do  not  know  the  population  standard  deviations,  we  estimate  them  using  the  two  sample 
standard  deviations  from  our  independent  samples.  For  the  hypothesis  test,  we  calculate  the  estimated 
standard  deviation,  or  standard  error,  of  the  difference  in  sample  means,  X-y  -  X2. 

The  standard  error  is:   


(10.1) 


The  test  statistic  (t-score)  is  calculated  as  follows: 


^This  content  is  available  online  at  <http: / /cnx.org/content/ml7025/1.18/>. 
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t-score 


"1 


+ 


"2 


(10.2) 


where: 

•  si  and  S2,  the  sample  standard  deviations,  are  estimates  of     and  cr2,  respectively. 

•  (7i  and  (72  are  the  unknown  population  standard  deviations. 

•  Xi  and  X2  are  the  sample  means.  }ii  and     are  the  population  means. 

The  degrees  of  freedom  (df)  is  a  somewhat  complicated  calculation.  However,  a  computer  or  calculator  cal- 
culates it  easily.  The  dfs  are  not  always  a  whole  number.  The  test  statistic  calculated  above  is  approximated 
by  the  student's-t  distribution  with  dfs  as  follows: 

Degrees  of  freedom 


"1 


+ 


"2 


(fif 
"1 


+ 


)I2  — 1 


(f2f 
"2 


(10.3) 


When  both  sample  sizes  ni  and  n2  are  five  or  larger,  the  student's-t  approximation  is  very  good.  Notice  that 
the  sample  variances  si^  and      are  not  pooled.  (If  the  question  comes  up,  do  not  pool  the  variances.) 

NOTE:  It  is  not  necessary  to  compute  this  by  hand.  A  calculator  or  computer  easily  computes  it. 

Example  10.1:  Independent  groups 

The  average  amount  of  time  boys  and  girls  ages  7  through  11  spend  playing  sports  each  day  is 
believed  to  be  the  same.  An  experiment  is  done,  data  is  collected,  resiilting  in  the  table  below. 
Both  populations  have  a  normal  distribution. 


Sample  Size 

Average  Number  of 
Hours  Playing  Sports 
Per  Day 

Sample  Standard 
Deviation 

Girls 

9 

2  hours 

V0.75 

Boys 

16 

3.2  hours 

1.00 

Table  10.1 


Problem 

Is  there  a  difference  in  the  mean  amount  of  time  boys  and  girls  ages  7  through  11  play  sports  each 
day?  Test  at  the  5%  level  of  significance. 

Solution 

The  population  standard  deviations  are  not  known.  Let  g  be  the  subscript  for  girls  and  b  be  the 
subscript  for  boys.  Then,  fig  is  the  population  mean  for  girls  and  is  the  popiilation  mean  for 
boys.  This  is  a  test  of  two  independent  groups,  two  population  means. 

Random  variable:  Xg  —  Xi,  =  difference  in  the  sample  mean  amount  of  time  girls  and  boys  play 
sports  each  day. 

Hoi  }ig  —  jii,  ^g  —  ji^,  —  0 


Available  for  free  at  Connexions  <http://cnx.Org/content/coI10522/l.40> 


432 


CHAPTER  10.  HYPOTHESIS  TESTING:  TWO  MEANS,  PAIRED  DATA,  TWO 

PROPORTIONS 


Ha-.  }lg  ^  }lb  Pg-}li,T^O 

The  words  "the  same"  tell  you  Hg  has  an  "=".  Since  there  are  no  other  words  to  indicate  Hg,  then 
assume  "is  different."  This  is  a  two-tailed  test. 

Distribution  for  the  test:  Use  t^jr  where  df  is  calculated  using  the  df  formula  for  independent 
groups,  two  population  means.  Using  a  calculator,  df  is  approximately  18.8462.  Do  not  pool  the 
variances. 

Calculate  the  p-value  using  a  student's-t  distribution:  p-value  =  0.0054 
Graph: 


From  H  ,  )Xg  -      =  0 
Figure  10.1 

=  7075 
Sf,  =  1 

So,     —  x]^  =  2  —  3.2  =  —1.2 

Half  the  p-value  is  below  -1.2  and  half  is  above  1.2. 

Make  a  decision:  Since  a  >  p-value,  reject  Hg. 

This  means  you  reject  pig  =  pii,.  The  means  are  different. 

Conclusion:  At  the  5%  level  of  significance,  the  sample  data  show  there  is  sufficient  evidence  to 
conclude  that  the  mean  number  of  hours  that  girls  and  boys  aged  7  through  11  play  sports  per  day 
is  different  (mean  number  of  hours  boys  aged  7  through  11  play  sports  per  day  is  greater  than  the 
mean  number  of  hours  played  by  girls  OR  the  mean  number  of  hours  girls  aged  7  through  11  play 
sports  per  day  is  greater  than  the  mean  number  of  hours  played  by  boys). 

NOTE:  TI-83+  and  TI-84:  Press  STAT.  Arrow  over  to  TESTS  and  press  4 : 2-SampTTest.  Arrow  over 
to  Stats  and  press  ENTER.  Arrow  down  and  enter  2  for  the  first  sample  mean,  ^/075  forSxl,  9 
for  nl,  3.2  for  the  second  sample  mean,  1  for  Sx2,  and  16  for  n2.  Arrow  down  to  f/1:  and  arrow 
to  does  not  equal  fi2.  Press  ENTER.  Arrow  down  to  Pooled:  and  No.  Press  ENTER.  Arrow  down  to 
Calculate  and  press  ENTER.  The  p-value  is  p  =  0.0054,  the  dfs  are  approximately  18.8462,  and  the 
test  statistic  is  -3.14.  Do  the  procedure  again  but  instead  of  Calculate  do  Draw. 
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Example  10.2 

A  study  is  done  by  a  community  group  in  two  neighboring  colleges  to  determine  which  one  grad- 
uates students  with  more  math  classes.  College  A  samples  11  graduates.  Their  average  is  4  math 
classes  with  a  standard  deviation  of  1.5  math  classes.  College  B  samples  9  graduates.  Their  aver- 
age is  3.5  math  classes  with  a  standard  deviation  of  1  math  class.  The  community  group  believes 
that  a  student  who  graduates  from  college  A  has  taken  more  math  classes,  on  the  average.  Both 
populations  have  a  normal  distribution.  Test  at  a  1%  significance  level.  Answer  the  following 
questions. 

Problem  1  (Solution  on  p.  466.) 

Is  this  a  test  of  two  means  or  two  proportions? 

Problem  2  (Solution  on  p.  466.) 

Are  the  populations  standard  deviations  known  or  imknown? 

Problem  3  (Solution  on  p.  466.) 

Which  distribution  do  you  use  to  perform  the  test? 

Problem  4  (Solution  on  p.  466.) 

What  is  the  random  variable? 

Problem  5  (Solution  on  p.  466.) 

What  are  the  nuU  and  alternate  hypothesis? 

Problem  6  (Solution  on  p.  466.) 

Is  this  test  right,  left,  or  two  tailed? 

Problem  7  (Solution  on  p.  466.) 

What  is  the  p-value? 

Problem  8  (Solution  on  p.  466.) 

Do  you  reject  or  not  reject  the  nuU  h}^othesis? 

Conclusion: 

At  the  1%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence  to  conclude 
that  a  student  who  graduates  from  college  A  has  taken  more  math  classes,  on  the  average,  than  a 
student  who  graduates  from  college  B. 

10.3  Comparing  Two  Independent  Population  Means  with  Known  Pop- 
ulation Standard  Deviations^ 

Even  though  this  situation  is  not  likely  (knowing  the  population  standard  deviations  is  not  likely),  the 
following  example  illustrates  hypothesis  testing  for  independent  means,  known  population  standard  de- 
viations. The  sampling  distribution  for  the  difference  between  the  means  is  normal  and  both  populations 
must  be  normal.  The  random  variable  is  Xi  —  X2.  The  normal  distribution  has  the  following  format: 

Normal  distribution 


(10.4) 


^This  content  is  available  online  at  <http://cnx.org/content/ml7042/1.10/>. 
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The  standard  deviation  is: 


ni 


"2 


(10.5) 


The  test  statistic  (z-score)  is: 


{Xi  -  X2)  -  ifll  -  }l2) 


(10.6) 


"1  "2 


Example  10.3 

independent  groups,  population  standard  deviations  known:  The  mean  lasting  time  of  2  com- 
peting floor  waxes  is  to  be  compared.  Twenty  floors  are  randomly  assigned  to  test  each  wax.  Both 
popiilations  have  a  normal  distribution.  The  following  table  is  the  resiilt. 


Wax 

Sample  Mean  Number  of  Months  Floor  Wax  Last 

Population  Standard  Deviation 

1 

3 

0.33 

2 

2.9 

0.36 

Table  10.2 

Problem 

Does  the  data  indicate  that  wax  1  is  more  effective  than  wax  2?  Test  at  a  5%  level  of  significance. 
Solution 

This  is  a  test  of  two  independent  groups,  two  population  means,  population  standard  deviations 
known. 

Random  Variable:  Xi  —  X2  =  difference  in  the  mean  number  of  months  the  competing  floor  waxes 
last. 

Ho  :  /ii  <  ]iz 
Ha:}li>  }12 

The  words  "is  more  effective"  says  that  wax  1  lasts  longer  than  wax  2,  on  the  average.  "Longer" 
is  a  "  >  "  symbol  and  goes  into  Ha-  Therefore,  this  is  a  right-tailed  test. 

Distribution  for  the  test:  The  population  standard  deviations  are  known  so  the  distribution  is 
normal.  Using  the  formula  above,  the  distribution  is: 

Since  jii  <  }i2  then  ^1  —  ^2  ^  0  and  the  mean  for  the  normal  distribution  is  0. 
Calculate  the  p-value  using  the  normal  distribution:  p-value  =  0.1799 
Graph: 
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p- value  =  0.1799 


From  H^:    |u  i  -  |a2  <  0 


Figure  10.2 


-     =  3  -  2.9  =  0.1 

Compare  a  and  the  p-value:  a  =  0.05  and  p-value  =  0.1799.  Therefore,  a  <  p-value. 
Make  a  decision:  Since  a.  <  p-value,  do  not  reject  Hg. 

Conclusion:  At  the  5%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence 
to  conclude  that  the  mean  time  wax  1  lasts  is  longer  (wax  1  is  more  effective)  than  the  mean  time 
wax  2  lasts. 

NOTE:  TI-83+  and  TI-84:  Press  STAT.  Arrow  over  to  TESTS  and  press  3 : 2-SampZTest.  Arrow  over 
to  Stats  and  press  ENTER.  Arrow  down  and  enter  .  33  for  sigmal,  .  36  for  sigma2,  3  for  the  first 
sample  mean,  20  for  nl,  2 . 9  for  the  second  sample  mean,  and  20  for  n2.  Arrow  down  to  ;^1:  and 
arrow  to  >  Ji2.  Press  ENTER.  Arrow  down  to  Calculate  and  press  ENTER.  The  p-value  is  p  =  0.1799 
and  the  test  statistic  is  0.9157.  Do  the  procedure  again  but  instead  of  Calculate  do  Draw. 


10.4  Comparing  Two  Independent  Population  Proportions* 

1.  The  two  independent  samples  are  simple  random  samples  that  are  mdependent. 

2.  The  number  of  successes  is  at  least  five  and  the  number  of  failures  is  at  least  five  for  each  of  the 
samples. 

Comparing  two  proportions,  like  comparing  two  means,  is  common.  If  two  estimated  proportions  are 
different,  it  may  be  due  to  a  difference  in  the  populations  or  it  may  be  due  to  chance.  A  hypothesis  test  can 
help  determine  if  a  difference  in  the  estimated  proportions  (P/i  —  Pg)  reflects  a  difference  in  the  population 
proportions. 

The  difference  of  two  proportions  follows  an  approximate  normal  distribution.  Generally,  the  null  hypoth- 
esis states  that  the  two  proportions  are  the  same.  That  is,  Hg  :  Pa  =  Pb-  To  conduct  the  test,  we  use  a  pooled 
proportion,  pc- 

*This  content  is  available  online  at  <http://cnx.org/content/ml7043/1.12/>. 
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The  pooled  proportion  is  calculated  as  follows: 


riA  +  nB 


(10.7) 


The  distribution  for  the  differences  is: 


P'a  -  P'b  ~  N 


0'a/Pc-(1-Pc) 


1  1 

 h  — 

ua  ub 


(10.8) 


The  test  statistic  (z-score)  is: 


Pc-(l-Pc)-(^  +  ^) 


(10.9) 


Example  10.4:  Two  population  proportions 

Two  tj^es  of  medication  for  hives  are  being  tested  to  determine  if  there  is  a  difference  in  the 
proportions  of  adult  patient  reactions.  Twenty  out  of  a  random  sample  of  200  adults  given  med- 
ication A  stiU  had  hives  30  minutes  after  taking  the  medication.  Twelve  out  of  another  random 
sample  of  200  adults  given  medication  B  still  had  hives  30  minutes  after  taking  the  medication. 
Test  at  a  1%  level  of  significance. 


(Solution  on  p.  466.) 


10.4.1  Determining  the  solution 

This  is  a  test  of  2  population  proportions. 

Problem 

How  do  you  know? 

Let  A  and  B  be  the  subscripts  for  medication  A  and  medication  B.  Then  pA  and  pB  are  the  desired 
popiilation  proportions. 

Random  Variable: 

P'a  —  P'b  —  difference  in  the  proportions  of  adult  patients  who  did  not  react  after  30  minutes  to 
medication  A  and  medication  B. 

Hg-.pA^PB  PA-PB^O 

Ha-.PAT^PB  PA-PBT^O 

The  words  "is  a  difference"  tell  you  the  test  is  two-tailed. 

Distribution  for  the  test:  Since  this  is  a  test  of  two  binoirual  population  proportions,  the  distribu- 
tion is  normal: 


20+12 
200+200 


0.08    1  -  pc  =  0.92 


Therefore,    P'a  -  P'b  ~  N 


0,  J(0.08).  (0.92).  (2^0 +  2I0) 


P'a  —  P'b  follows  an  approximate  normal  distribution. 

Calculate  the  p-value  using  the  normal  distribution:  p-value  =  0.1404. 
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Estimated  proportion  for  group  A 
Estimated  proportion  for  group  B: 
Graph: 


From  Ho,  Pa  -  Pb  =  0. 


Figure  10.3 

P'^  -  P'g  =  0.1  -  0.06  =  0.04. 

Half  the  p-value  is  below  -0.04  and  half  is  above  0.04. 

Compare  oc  and  the  p-value:  a  =  0.01  and  the  p-value  =  0.1404.  a  <  p-value. 
Make  a  decision:  Since  a.  <  p-value,  do  not  reject  Hg. 

Conclusion:  At  a  1%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence  to 
conclude  that  there  is  a  difference  in  the  proportions  of  adult  patients  who  did  not  react  after  30 
minutes  to  medication  A  and  medication  B. 

NOTE:  TI-83+  and  TI-84:  Press  STAT.  Arrow  over  to  TESTS  and  press  6 : 2-PropZTest.  Arrow  down 
and  enter  20  for  xl,  200  for  nl,  12  for  x2,  and  200  for  n2.  Arrow  down  to  pi:  and  arrow  to  not 
equal  p2.  Press  ENTER.  Arrow  down  to  Calculate  and  press  ENTER.  The  p-value  is  p  =  0.1404 
and  the  test  statistic  is  1.47.  Do  the  procedure  again  but  instead  of  Calculate  do  Draw. 


10.5  Matched  or  Paired  Samples^ 

1.  Simple  random  sampling  is  used. 

2.  Sample  sizes  are  often  small. 

3.  Two  measurements  (samples)  are  drawn  from  the  same  pair  of  individuals  or  objects. 

4.  Differences  are  calculated  from  the  matched  or  paired  samples. 

5.  The  differences  form  the  sample  that  is  used  for  the  hypothesis  test. 

^This  content  is  available  online  at  <http://cnx.Org/content/ml7033/l.15/>. 
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6.  The  matched  pairs  have  differences  that  either  come  from  a  population  that  is  normal  or  the  number  of 
differences  is  sufficiently  large  so  the  distribution  of  the  sample  mean  of  differences  is  approximately 
normal. 

In  a  h5^othesis  test  for  matched  or  paired  samples,  subjects  are  matched  in  pairs  and  differences  are  cal- 
culated. The  differences  are  the  data.  The  population  mean  for  the  differences,  is  then  tested  using 
a  Student-t  test  for  a  single  population  mean  with  n  —  1  degrees  of  freedom  where  n  is  the  number  of 
differences. 

The  test  statistic  (t-score)  is: 


(10.10) 


Example  10.5:  Matched  or  paired  samples 

A  study  was  conducted  to  investigate  the  effectiveness  of  hj/pnotism  in  reducing  pain.  Results 
for  randomly  selected  subjects  are  shown  in  the  table.  The  "before"  value  is  matched  to  an  "after" 
value  and  the  differences  are  calciilated.  The  differences  have  a  normal  distribution. 


Subject: 

A 

B 

C 

D 

E 

F 

G 

H 

Before 

6.6 

6.5 

9.0 

10.3 

11.3 

8.1 

6.3 

11.6 

After 

6.8 

2.4 

7.4 

8.5 

8.1 

6.1 

3.4 

2.0 

Table  10.3 

Problem 

Are  the  sensory  measurements,  on  average,  lower  after  h3^notism?  Test  at  a  5%  significance  level. 
Solution 

Corresponding  "before"  and  "after"  values  form  matched  pairs.  (Calculate  "sfter"  -  "before"). 


After  Data 

Before  Data 

Difference 

6.8 

6.6 

0.2 

2.4 

6.5 

-4.1 

7.4 

9 

-1.6 

8.5 

10.3 

-1.8 

8.1 

11.3 

-3.2 

6.1 

8.1 

-2 

3.4 

6.3 

-2.9 

2 

11.6 

-9.6 

Table  10.4 

The  data  for  the  test  are  the  differences:  {0.2,  -4.1,  -1.6,  -1.8,  -3.2,  -2,  -2.9,  -9.6) 

The  sample  mean  and  sample  standard  deviation  of  the  differences  are: 
=  2.91  Verify  these  values. 


-3.13  and 


Let     be  the  population  mean  for  the  differences.  We  use  the  subscript  d  to  denote  "differences." 
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Random  Variable:      =  the  mean  difference  of  the  sensory  measurements 

Ho  :  //rf  >  0  (10.11) 
There  is  no  improvement,  {pi^  is  the  population  mean  of  the  differences.) 

H„  :  f/rf  <  0  (10.12) 

There  is  improvement.  The  score  should  be  lower  after  hypnotism  so  the  difference  ought  to  be 
negative  to  indicate  improvement. 

Distribution  for  the  test:  The  distribution  is  a  student-t  with  df  =  n  —  1=8  —  1  =  7.  Use  ty. 
(Notice  that  the  test  is  for  a  single  population  mean.) 

Calculate  the  p-value  using  the  Student-t  distribution:  p-value  =  0.0095 

Graph: 

P-value  =  0.0095 


-3.13  0 

From  H^,  j^d  ^  0 


Figure  10.4 


is  the  random  variable  for  the  differences. 
The  sample  mean  and  sample  standard  deviation  of  the  differences  are: 

=  —3.13 
Sd  =  2.91 

Compare  a  and  the  p-value:  a  =  0.05  and  p-value  =  0.0095.  a  >  p-value. 
Make  a  decision:  Since  a  >  p-value,  reject  Hg. 
This  means  that  pi^  <  0  and  there  is  improvement. 

Conclusion:  At  a  5%  level  of  significance,  from  the  sample  data,  there  is  sufficient  evidence  to  con- 
clude that  the  sensory  measurements,  on  average,  are  lower  after  hypnotism.  Hypnotism  appears 
to  be  effective  in  reducing  pain. 

NOTE:  For  the  T1-83+  and  Tl-84  calculators,  you  can  either  calculate  the  differences  ahead  of  time 
(after  -  before)  and  put  the  differences  into  a  list  or  you  can  put  the  after  data  into  a  first  list  and 


Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


440 


CHAPTER  10.  HYPOTHESIS  TESTING:  TWO  MEANS,  PAIRED  DATA,  TWO 

PROPORTIONS 


the  before  data  into  a  second  list.  Then  go  to  a  third  list  and  arrow  up  to  the  name.  Enter  1st  list 
name  -  2nd  list  name.  The  calculator  will  do  the  subtraction  and  you  wiU  have  the  differences  in 
the  third  list. 


NOTE:  TI-83+  and  TI-84:  Use  youj  list  of  differences  as  the  data.  Press  STAT  and  arrow  over  to 
TESTS.  Press  2  :T-Test.  Arrow  over  to  Data  and  press  ENTER.  Arrow  down  and  enter  0  for  jiQ,  the 
name  of  the  list  where  you  put  the  data,  and  1  for  Freq:.  Arrow  down  to  fi:  and  arrow  over  to  < 
^0-  Press  ENTER.  Arrow  down  to  Calculate  and  press  ENTER.  The  p-value  is  0.0094  and  the  test 
statistic  is  -3.04.  Do  these  instructions  again  except  arrow  to  Draw  (instead  of  Calculate).  Press 
ENTER. 


Example  10.6 

A  college  football  coach  was  interested  in  whether  the  college's  strength  development  class  in- 
creased his  players'  maximum  lift  (in  pounds)  on  the  bench  press  exercise.  He  asked  4  of  his 
players  to  participate  in  a  study.  The  amount  of  weight  they  could  each  lift  was  recorded  before 
they  took  the  strength  development  class.  After  completing  the  class,  the  amount  of  weight  they 
could  each  lift  was  again  measured.  The  data  are  as  follows: 


Weight  (in  pounds) 

Player  1 

Player  2 

Player  3 

Player  4 

Amount  of  weighted  lifted  prior  to  the  class 

205 

241 

338 

368 

Amount  of  weight  lifted  after  the  class 

295 

252 

330 

360 

Table  10.5 


The  coach  wants  to  know  if  the  strength  development  class  makes  his  players  stronger,  on 
average. 

Problem  (Solution  on  p.  466.) 

Record  the  differences  data.  Calculate  the  differences  by  subtracting  the  amount  of  weight  lifted 
prior  to  the  class  from  the  weight  lifted  after  completing  the  class.  The  data  for  the  differences  are: 
{90, 11,  -8,  -8}.  The  differences  have  a  normal  distribution. 

Using  the  differences  data,  calculate  the  sample  mean  and  the  sample  standard  deviation. 
=  21.3  Srf  =  46.7 

Using  the  difference  data,  this  becomes  a  test  of  a  single  (fill  in  the  blank). 

Define  the  random  variable:      —  mean  difference  in  the  maximum  lift  per  player. 
The  distribution  for  the  hj^othesis  test  is  {3. 
Ho:Hd<0  Ha:}id>0 
Graph: 
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p- value  =  0.2150 


Figure  10.5 


Calculate  the  p-value:  The  p-value  is  0.2150 

Decision:  If  the  level  of  significance  is  5%,  the  decision  is  to  not  reject  the  null  h5rpothesis  because 
Oi  <  p-value. 

What  is  the  conclusion? 
Example  10.7 

Seven  eighth  graders  at  Kennedy  Middle  School  measured  how  far  they  could  push  the  shot-put 
with  their  dominant  (writing)  hand  and  their  weaker  (non-writing)  hand.  They  thought  that  they 
could  push  equal  distances  with  either  hand.  The  following  data  was  collected. 


Distance 
(in  feet) 
using 

Student  1 

Student  2 

Student  3 

Student  4 

Student  5 

Student  6 

Student  7 

Dominant 
Hand 

30 

26 

34 

17 

19 

26 

20 

Weaker 
Hand 

28 

14 

27 

18 

17 

26 

16 

Table  10.6 


Problem  (Solution  on  p.  466.) 

Conduct  a  hypothesis  test  to  determine  whether  the  mean  difference  in  distances  between  the 
children's  dominant  versus  weaker  hands  is  significant. 

Hint:  use  a  t-test  on  the  difference  data.  Assume  the  differences  have  a  normal  distribution.  The 
random  variable  is  the  mean  difference. 

Check:  The  test  statistic  is  2.18  and  the  p-value  is  0.0716. 
What  is  your  conclusion? 
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10.6  Summary  of  Types  of  Hypothesis  Tests^ 

Two  Population  Means 

•  Popiilations  are  independent  and  population  standard  deviations  are  imknown. 

•  Populations  are  independent  and  population  standard  deviations  are  known  (not  likely). 

Matched  or  Paired  Samples 

•  Two  samples  are  drawn  from  the  same  set  of  objects. 

•  Samples  are  dependent. 

Two  Population  Proportions 

•  Populations  are  independent. 


*This  content  is  available  online  at  <http://cnx.org/content/ml7044/1.5/>. 
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10.7  Practice  1:  Hypothesis  Testing  for  Two  Proportions^ 

10.7.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  hypothesis  test  of  two  proportions. 


10.7.2  Given 

In  the  recent  Census,  3  percent  of  the  U.S.  population  reported  being  two  or  more 
races.  However,     the    percent    varies    tremendously    from    state    to    state.  {Source: 

http://www.census.gov/prod/cen2010/briefs/c2010br-02.pdf)  Suppose  that  two  random  surveys 
are  conducted.  In  the  first  random  survey,  out  of  1000  North  Dakotans,  only  9  people  reported  being  of 
two  or  more  races.  In  the  second  random  survey,  out  of  500  Nevadans,  17  people  reported  being  of  two 
or  more  races.  Conduct  a  hypothesis  test  to  determine  if  the  population  percents  are  the  same  for  the  two 
states  or  if  the  percent  for  Nevada  is  statistically  higher  than  for  North  Dakota. 


10.7.3  Hypothesis  Testing:  Two  Proportions 

Exercise  10.7.1  (Solution  on  p.  466.) 

Is  this  a  test  of  means  or  proportions? 

Exercise  10.7.2  (Solution  on  p.  466.) 
State  the  null  and  alternative  hj^otheses. 

a.  Ho  : 

b.  Ha: 


Exercise  10.7.3  (Solution  on  p.  466.) 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  How  do  you  know? 

Exercise  10.7.4 

What  is  the  Random  Variable  of  interest  for  this  test? 
Exercise  10.7.5 

In  words,  define  the  Random  Variable  for  this  test. 

Exercise  10.7.6  (Solution  on  p.  466.) 

Which  distribution  (Normal  or  student's-t)  would  you  use  for  this  h5^othesis  test? 

Exercise  10.7.7 

Explain  why  you  chose  the  distribution  you  did  for  the  above  question. 

Exercise  10.7.8  (Solution  on  p.  466.) 

Calculate  the  test  statistic. 

Exercise  10.7.9 

Sketch  a  graph  of  the  situation.  Mark  the  hypothesized  difference  and  the  sample  difference. 
Shade  the  area  corresponding  to  the  p— value. 


''This  content  is  available  online  at  <http:/ / cnx.org/ content/ ml7027/1.13/>. 
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Figure  10.6 


Exercise  10.7.10  (Solution  on  p.  466.) 

Find  the  p— value: 

Exercise  10.7.11  (Solution  on  p.  466.) 

At  a  pre-conceived  a  =  0.05,  what  is  your: 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 


10.7.4  Discussion  Question 
Exercise  10.7.12 

Does  it  appear  that  the  proportion  of  Nevadans  who  are  two  or  more  races  is  higher  than  the 
proportion  of  North  Dakotans?  Why  or  why  not? 
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10.8  Practice  2:  Hypothesis  Testing  for  Two  Averages^ 

10.8.1  Student  Learning  Outcome 

•  The  student  will  conduct  a  hypothesis  test  of  two  means. 


10.8.2  Given 

The  U.S.  Center  for  Disease  Control  reports  that  the  mean  life  expectancy  for  whites  bom  in  1900  was 
47.6  years  and  for  nonwhites  it  was  33.0  years,  (http: / /www.cdc.gov/nchs/data/ dvs/nvsr53_06tl2.pdf  ) 
Suppose  that  you  randomly  survey  death  records  for  people  born  in  1900  in  a  certain  coimty.  Of  the  124 
whites,  the  mean  life  span  was  45.3  years  with  a  standard  deviation  of  12.7  years.  Of  the  82  nonwhites,  the 
mean  life  span  was  34.1  years  with  a  standard  deviation  of  15.6  years.  Conduct  a  hypothesis  test  to  see  if 
the  mean  life  spans  in  the  county  were  the  same  for  whites  and  nonwhites. 


10.8.3  Hypothesis  Testing:  Two  Means 

Exercise  10.8.1  (Solution  on  p.  467.) 
Is  this  a  test  of  means  or  proportions? 

Exercise  10.8.2  (Solution  on  p.  467.) 

State  the  null  and  alternative  h5^otheses. 

a.  Ho: 

b.  Ha  : 


Exercise  10.8.3  (Solution  on  p.  467.) 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  How  do  you  know? 

Exercise  10.8.4  (Solution  on  p.  467.) 

What  is  the  Random  Variable  of  interest  for  this  test? 

Exercise  10.8.5  (Solution  on  p.  467.) 

In  words,  define  the  Random  Variable  of  interest  for  this  test. 

Exercise  10.8.6 

Which  distribution  (Normal  or  student's-t)  woiild  you  use  for  this  hj^othesis  test? 
Exercise  10.8.7 

Explain  why  you  chose  the  distribution  you  did  for  the  above  question. 

Exercise  10.8.8  (Solution  on  p.  467.) 

Calculate  the  test  statistic. 

Exercise  10.8.9 

Sketch  a  graph  of  the  situation.  Label  the  horizontal  axis.  Mark  the  hypothesized  difference  and 
the  sample  difference.  Shade  the  area  corresponding  to  the  p— value. 

*This  content  is  available  online  at  <http://caTx.org/content/ml7039/1.12/>. 
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Figure  10.7 


Exercise  10.8.10 

Find  the  p— value: 

Exercise  10.8.11 


(Solution  on  p.  467.) 


(Solution  on  p.  467.) 


At  a  pre-conceived  a  =  0.05,  what  is  your: 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 


10.8.4  Discussion  Question 
Exercise  10.8.12 

Does  it  appear  that  the  means  are  the  same?  Why  or  why  not? 
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10.9  Homework' 

For  questions  Exercise  10.9.1  -  Exercise  10.9.10,  indicate  which  of  the  following  choices  best  identifies  the 
hypothesis  test. 

A.  Independent  group  means,  population  standard  deviations  and/or  variances  known 

B.  Independent  group  means,  population  standard  deviations  and/ or  variances  unknown 

C.  Matched  or  paired  samples 

D.  Single  mean 

E.  2  proportions 

F.  Single  proportion 

Exercise  10.9.1  (Solution  on  p.  467.) 

A  powder  diet  is  tested  on  49  people  and  a  liquid  diet  is  tested  on  36  different  people.  The  pop- 
ulation standard  deviations  are  2  poimds  and  3  pounds,  respectively.  Of  interest  is  whether  the 
liquid  diet  yields  a  higher  mean  weight  loss  than  the  powder  diet. 

Exercise  10.9.2 

A  new  chocolate  bar  is  taste-tested  on  consumers.  Of  interest  is  whether  the  proportion  of  children 
that  like  the  new  chocolate  bar  is  greater  than  the  proportion  of  adults  that  like  it. 

Exercise  10.9.3  (Solution  on  p.  467.) 

The  mean  number  of  English  courses  taken  in  a  two-year  time  period  by  male  and  female  college 
students  is  believed  to  be  about  the  same.  An  experiment  is  conducted  and  data  are  collected  from 
9  males  and  16  females. 

Exercise  10.9.4 

A  football  league  reported  that  the  mean  number  of  touchdowns  per  game  was  5.  A  study  is  done 
to  determine  if  the  mean  number  of  touchdowns  has  decreased. 

Exercise  10.9.5  (Solution  on  p.  467.) 

A  study  is  done  to  determine  if  students  in  the  California  state  university  system  take  longer  to 
graduate  than  students  enrolled  in  private  universities.  100  students  from  both  the  California  state 
imiversity  system  and  private  universities  are  surveyed.  From  years  of  research,  it  is  known  that 
the  popiilation  standard  deviations  are  1.5811  years  and  1  year,  respectively. 

Exercise  10.9.6 

According  to  a  YWCA  Rape  Crisis  Center  newsletter,  75%  of  rape  victims  know  their  attackers.  A 
study  is  done  to  verify  this. 

Exercise  10.9.7  (Solution  on  p.  467.) 

According  to  a  recent  study,  U.S.  companies  have  an  mean  maternity-leave  of  six  weeks. 

Exercise  10.9.8 

A  recent  drug  survey  showed  an  increase  in  use  of  drugs  and  alcohol  among  local  high  school 
students  as  compared  to  the  national  percent.  Suppose  that  a  survey  of  100  local  youths  and  100 
national  youths  is  conducted  to  see  if  the  proportion  of  drug  and  alcohol  use  is  higher  locally  than 
nationally. 

Exercise  10.9.9  (Solution  on  p.  467.) 

A  new  SAT  study  course  is  tested  on  12  individuals.   Pre-course  and  post-course  scores  are 
recorded.  Of  interest  is  the  mean  increase  in  SAT  scores. 

Exercise  10.9.10 

University  of  Michigan  researchers  reported  in  the  Journal  of  the  National  Cancer  Institute  that 
quitting  smoking  is  especially  beneficial  for  those  under  age  49.  In  this  American  Cancer  Society 
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study,  the  risk  (probability)  of  dying  of  lung  cancer  was  about  the  same  as  for  those  who  had  never 
smoked. 


10.9.1 

Directions:  For  each  of  the  word  problems,  use  a  solution  sheet  to  do  the  h3^othesis  test.  The 
solution  sheet  is  found  in  14.  Appendix  (online  book  version:  the  link  is  "Solution  Sheets";  PDF 
book  version:  look  under  14.5  Solution  Sheets).  Please  feel  free  to  make  copies  of  the  solution 
sheets.  For  the  online  version  of  the  book,  it  is  suggested  that  you  copy  the  .doc  or  the  .pdf  fUes. 


NOTE:  If  you  are  using  a  student 's-t  distribution  for  a  homework  problem  below,  including  for 
paired  data,  you  may  assume  that  the  underlying  population  is  normally  distributed.  (In  general, 
you  must  first  prove  that  assumption,  though.) 

Exercise  10.9.11  (Solution  on  p.  467.) 

A  powder  diet  is  tested  on  49  people  and  a  liquid  diet  is  tested  on  36  different  people.  Of  interest 
is  whether  the  liquid  diet  yields  a  higher  mean  weight  loss  than  the  powder  diet.  The  powder  diet 
group  had  an  mean  weight  loss  of  42  pounds  with  a  standard  deviation  of  12  pounds.  The  liquid 
diet  group  had  an  mean  weight  loss  of  45  pounds  with  a  standard  deviation  of  14  pounds. 

Exercise  10.9.12 

The  mean  number  of  English  courses  taken  in  a  two-year  time  period  by  male  and  female  college 
students  is  believed  to  be  about  the  same.  An  experiment  is  conducted  and  data  are  collected  from 
29  males  and  16  females.  The  males  took  an  average  of  3  English  courses  with  a  standard  deviation 
of  0.8.  The  females  took  an  average  of  4  English  courses  with  a  standard  deviation  of  1.0.  Are  the 
means  statistically  the  same? 

Exercise  10.9.13  (Solution  on  p.  467.) 

A  study  is  done  to  determine  if  students  in  the  California  state  imiversity  system  take  longer 
to  graduate,  on  average,  than  students  enrolled  in  private  universities.  100  students  from  both 
the  California  state  university  system  and  private  universities  are  surveyed.  Suppose  that  from 
years  of  research,  it  is  known  that  the  population  standard  deviations  are  1.5811  years  and  1  year, 
respectively.  The  following  data  are  collected.  The  California  state  university  system  students 
took  on  average  4.5  years  with  a  standard  deviation  of  0.8.  The  private  university  students  took 
on  average  4.1  years  with  a  standard  deviation  of  0.3. 

Exercise  10.9.14 

A  new  SAT  study  course  is  tested  on  12  individuals.  Pre-course  and  post-course  scores  are 
recorded.  Of  interest  is  the  mean  increase  in  SAT  scores.  The  following  data  are  collected: 
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Pre-course  score 

Post-coiurse  score 

1200 

1300 

960 

920 

1010 

1100 

840 

880 

1100 

1070 

1250 

1320 

860 

860 

1330 

1370 

790 

770 

990 

1040 

1110 

1200 

740 

850 

Table  10.7 


Exercise  10.9.15  (Solution  on  p.  467.) 

A  recent  drug  sujvey  showed  an  increase  in  use  of  drugs  and  alcohol  among  local  high  school 
seniors  as  compared  to  the  national  percent.  Suppose  that  a  survey  of  100  local  seniors  and  100 
national  seniors  is  conducted  to  see  if  the  proportion  of  drug  and  alcohol  use  is  higher  locally  than 
nationally.  Locally,  65  seniors  reported  using  drugs  or  alcohol  within  the  past  month,  while  60 
national  seniors  reported  using  them. 

Exercise  10.9.16 

A  student  at  a  four-year  college  claims  that  mean  enrollment  at  four-year  colleges  is  higher  than 
at  two-year  colleges  in  the  United  States.  Two  surveys  are  conducted.  Of  the  35  two-year  colleges 
surveyed,  the  mean  enrollment  was  5068  with  a  standard  deviation  of  4777.  Of  the  35  four-year 
colleges  surveyed,  the  mean  enrollment  was  5466  with  a  standard  deviation  of  8191.  (Soiurce: 
Microsoft  Bookshelf) 

Exercise  10.9.17  (Solution  on  p.  467.) 

A  study  was  conducted  by  the  U.S.  Army  to  see  if  applying  antiperspirant  to  soldiers'  feet  for  a 
few  days  before  a  major  hike  woiild  help  cut  down  on  the  number  of  blisters  soldiers  had  on  their 
feet.  In  the  experiment,  for  three  nights  before  they  went  on  a  13-mile  hike,  a  group  of  328  West 
Point  cadets  put  an  alcohol-based  antiperspirant  on  their  feet.  A  "control  group"  of  339  soldiers 
put  on  a  similar,  but  inactive,  preparation  on  their  feet.  On  the  day  of  the  hike,  the  temperatiire 
reached  83  °  F.  At  the  end  of  the  hike,  21%  of  the  soldiers  who  had  used  the  antiperspirant  and  48% 
of  the  control  group  had  developed  foot  blisters.  Conduct  a  h3^othesis  test  to  see  if  the  proportion 
of  soldiers  using  the  antiperspirant  was  significantly  lower  than  the  control  group.  (Soiirce:  U.S. 
Army  study  reported  in  Journal  of  the  American  Academy  of  Dermatologists) 

Exercise  10.9.18 

We  are  interested  in  whether  the  proportions  of  female  suicide  victims  for  ages  15  to  24  are  the 
same  for  the  white  and  the  black  races  in  the  United  States.  We  randomly  pick  one  year,  1992, 
to  compare  the  races.  The  number  of  smcides  estimated  in  the  United  States  in  1992  for  white 
females  is  4930.  580  were  aged  15  to  24.  The  estimate  for  black  females  is  330.  40  were  aged  15  to 
24.  We  will  let  female  suicide  victims  be  our  population.  (Source;  the  National  Center  for  Health 
Statistics,  U.S.  Dept.  of  Health  and  Human  Services) 
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Exercise  10.9.19  (Solution  on  p.  468.) 

At  Rachel's  11th  birthday  party,  8  girls  were  timed  to  see  how  long  (in  seconds)  they  could  hold 
their  breath  in  a  relaxed  position.  After  a  two-minute  rest,  they  timed  themselves  while  jumping. 
The  girls  thought  that  the  mean  difference  between  their  jumping  and  relaxed  times  would  be  0. 
Test  their  hypothesis. 


Relaxed  time  (seconds) 

Jumping  time  (seconds) 

26 

21 

47 

40 

30 

28 

22 

21 

23 

25 

45 

43 

37 

35 

29 

32 

Table  10.8 


Exercise  10.9.20 

Elizabeth  Mjelde,  an  art  history  professor,  was  interested  in  whether  the  value  from  the  Golden 
Ratio  formula,  ^  '^'^^^atg™  dimension      )  Same  in  the  Whitney  Exhibit  for  works  from  1900 

-  1919  as  for  works  from  1920  -  1942.  37  early  works  were  sampled.  They  averaged  1.74  with 
a  standard  deviation  of  0.11.  65  of  the  later  works  were  sampled.  They  averaged  1.746  with  a 
standard  deviation  of  0.1064.  Do  you  think  that  there  is  a  significant  difference  in  the  Golden 
Ratio  calculation?  (Source:  data  from  Whitney  Exhibit  on  loan  to  San  Jose  Museum  of  Art) 

Exercise  10.9.21  (Solution  on  p.  468.) 

One  of  the  questions  in  a  study  of  marital  satisfaction  of  dual-career  couples  was  to  rate  the  state- 
ment, "I'm  pleased  with  the  way  we  divide  the  responsibilities  for  childcare."  The  ratings  went 
from  1  (strongly  agree)  to  5  (strongly  disagree).  Below  are  ten  of  the  paired  responses  for  hus- 
bands and  wives.  Conduct  a  hjrpothesis  test  to  see  if  the  mean  difference  in  the  husband's  versus 
the  wife's  satisfaction  level  is  negative  (meaning  that,  within  the  partnership,  the  husband  is  hap- 
pier than  the  wife). 


Wife's  score 

2 

2 

3 

3 

4 

2 

1 

1 

2 

4 

Husband's  score 

2 

2 

1 

3 

2 

1 

1 

1 

2 

4 

Table  10.9 


Exercise  10.9.22 

Ten  individuals  went  on  a  low-fat  diet  for  12  weeks  to  lower  their  cholesterol.  Evaluate  the  data 
below.  Do  you  think  that  their  cholesterol  levels  were  significantly  lowered? 
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Starting  cholesterol  level 

Ending  cholesterol  level 

140 

140 

220 

230 

110 

120 

240 

220 

200 

190 

180 

150 

190 

200 

360 

300 

280 

300 

260 

240 

Table  10.10 


Exercise  10.9.23  (Solution  on  p.  468.) 

Mean  entry  level  salaries  for  college  graduates  with  mechanical  engineering  degrees  and  elec- 
trical engineering  degrees  are  believed  to  be  approximately  the  same.  (Source:  http:// 
www.graduatingengineer.com^^  ).  A  recruiting  office  thinks  that  the  mean  mechanical  engineer- 
ing salary  is  actually  lower  than  the  mean  electrical  engineering  salary.  The  recruiting  office  ran- 
domly surveys  50  entry  level  mechanical  engineers  and  60  entry  level  electrical  engineers.  Their 
mean  salaries  were  $46,100  and  $46,700,  respectively.  Their  standard  deviations  were  $3450  and 
$4210,  respectively.  Conduct  a  h3^othesis  test  to  determine  if  you  agree  that  the  mean  entry  level 
mechanical  engineering  salary  is  lower  than  the  mean  entry  level  electrical  engineering  salary. 

Exercise  10.9.24 

A  recent  year  was  randomly  picked  from  1985  to  the  present.  In  that  year,  there  were  2051  Hispanic 
students  at  Cabrillo  College  out  of  a  total  of  12,328  students.  At  Lake  Tahoe  College,  there  were 
321  Hispanic  students  out  of  a  total  of  2441  students.  In  general,  do  you  think  that  the  percent 
of  Hispanic  students  at  the  two  colleges  is  basically  the  same  or  different?  (Source:  Chancellor's 
Office,  California  Corcunimity  Colleges,  November  1994) 

Exercise  10.9.25  (Solution  on  p.  468.) 

Eight  rtinners  were  convinced  that  the  mean  difference  in  their  individual  times  for  running  one 
mile  versus  race  walking  one  mile  was  at  most  2  minutes.  Below  are  their  times.  Do  you  agree 
that  the  mean  difference  is  at  most  2  minutes? 


^"http:  /  /  www.graduatingengmeer.com/ 
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Riirming  time  (minutes) 

Race  walking  time  (minutes) 

5.1 

7.3 

5.6 

9.2 

6.2 

10.4 

4.8 

6.9 

7.1 

8.9 

4.2 

9.5 

6.1 

9.4 

4.4 

7.9 

Table  10.11 


Exercise  10.9.26 

Marketing  companies  have  collected  data  implying  that  teenage  girls  use  more  ring  tones  on  their 
cellular  phones  than  teenage  boys  do.  In  one  particular  study  of  40  randomly  chosen  teenage  girls 
and  boys  (20  of  each)  with  cellular  phones,  the  mean  niimber  of  ring  tones  for  the  girls  was  3.2 
with  a  standard  deviation  of  1.5.  The  mean  for  the  boys  was  1.7  with  a  standard  deviation  of  0.8. 
Conduct  a  hypothesis  test  to  determine  if  the  means  are  approximately  the  same  or  if  the  girls' 
mean  is  higher  than  the  boys'  mean. 

Exercise  10.9.27  (Solution  on  p.  468.) 

While  her  husband  spent  IVi  hours  picking  out  new  speakers,  a  statistician  decided  to  determine 
whether  the  percent  of  men  who  enjoy  shopping  for  electronic  equipment  is  higher  than  the  per- 
cent of  women  who  enjoy  shopping  for  electronic  equipment.  The  popiilation  was  Saturday  af- 
ternoon shoppers.  Out  of  67  men,  24  said  they  enjoyed  the  activity.  8  of  the  24  women  surveyed 
claimed  to  enjoy  the  activity.  Interpret  the  results  of  the  survey. 

Exercise  10.9.28 

We  are  interested  in  whether  children's  educational  computer  software  costs  less,  on  average, 
than  children's  entertainment  software.  36  educational  software  titles  were  randomly  picked  from 
a  catalog.  The  mean  cost  was  $31.14  with  a  standard  deviation  of  $4.69.  35  entertainment  software 
titles  were  randomly  picked  from  the  same  catalog.  The  mean  cost  was  $33.86  with  a  standard 
deviation  of  $10.87.  Decide  whether  children's  educational  software  costs  less,  on  average,  than 
children's  entertainment  software.  (Source:  Educational  Resources,  December  catalog) 

Exercise  10.9.29  (Solution  on  p.  468.) 

Parents  of  teenage  boys  often  complain  that  auto  insurance  costs  more,  on  average,  for  teenage 
boys  than  for  teenage  girls.  A  group  of  concerned  parents  examines  a  random  sample  of  insurance 
biUs.  The  mean  annual  cost  for  36  teenage  boys  was  $679.  For  23  teenage  girls,  it  was  $559.  From 
past  years,  it  is  known  that  the  population  standard  deviation  for  each  group  is  $180.  Determine 
whether  or  not  you  believe  that  the  mean  cost  for  auto  insurance  for  teenage  boys  is  greater  than 
that  for  teenage  girls. 

Exercise  10.9.30 

A  group  of  transfer  bound  students  wondered  if  they  will  spend  the  same  mean  amount  on  texts 
and  supplies  each  year  at  their  four-year  university  as  they  have  at  their  community  college.  They 
conducted  a  random  survey  of  54  students  at  their  community  college  and  66  students  at  their 
local  four-year  university.  The  sample  means  were  $947  and  $1011,  respectively.  The  population 
standard  deviations  are  known  to  be  $254  and  $87,  respectively.  Conduct  a  hypothesis  test  to 
determine  if  the  means  are  statistically  the  same. 
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Exercise  10.9.31  (Solution  on  p.  468.) 

Joan  Nguyen  recently  claimed  that  the  proportion  of  college-age  males  with  at  least  one  pierced 
ear  is  as  high  as  the  proportion  of  college-age  females.  She  conducted  a  survey  in  her  classes.  Out 
of  107  males,  20  had  at  least  one  pierced  ear.  Out  of  92  females,  47  had  at  least  one  pierced  ear.  Do 
you  believe  that  the  proportion  of  males  has  reached  the  proportion  of  females? 

Exercise  10.9.32 

Some  manufacturers  claim  that  non-hybrid  sedan  cars  have  a  lower  mean  miles  per  gallon  (mpg) 
than  hybrid  ones.  Suppose  that  consumers  test  21  hybrid  sedans  and  get  a  mean  of  31  mpg  with  a 
standard  deviation  of  7  mpg.  Thirty-one  non-hybrid  sedans  get  a  mean  of  22  mpg  with  a  standard 
deviation  of  4  mpg.  Suppose  that  the  population  standard  deviations  are  known  to  be  6  and  3, 
respectively.  Conduct  a  h5^othesis  test  to  the  manufacturers  claim. 

Questions  Exercise  10.9.33  -  Exercise  10.9.37  refer  to  the  Terri  Vogel's  data  set  (see  Table  of  Contents). 

Exercise  10.9.33  (Solution  on  p.  468.) 

Using  the  data  from  Lap  1  only,  conduct  a  hypothesis  test  to  determine  if  the  mean  time  for  com- 
pleting a  lap  in  races  is  the  same  as  it  is  in  practices. 

Exercise  10.9.34 

Repeat  the  test  in  Exercise  10.9.33,  but  use  Lap  5  data  this  time. 

Exercise  10.9.35  (Solution  on  p.  469.) 

Repeat  the  test  in  Exercise  10.9.33,  but  this  time  combine  the  data  from  Laps  1  and  5. 

Exercise  10.9.36 

In  2  -  3  complete  sentences,  explain  in  detail  how  you  might  use  Terri  Vogel's  data  to  answer  the 
following  question.  "Does  Terri  Vogel  drive  faster  in  races  than  she  does  in  practices?" 

Exercise  10.9.37  (Solution  on  p.  469.) 

Is  the  proportion  of  race  laps  Terri  completes  slower  than  130  seconds  less  than  the  proportion  of 
practice  laps  she  completes  slower  than  135  seconds? 

Exercise  10.9.38 

"To  Breakfast  or  Not  to  Breakfast?"  by  Richard  Ayore 

In  the  American  society,  birthdays  are  one  of  those  days  that  everyone  looks  forward  to.  People  of 
different  ages  and  peer  groups  gather  to  mark  the  18th,  20th, . . .  birthdays.  During  this  time,  one 
looks  back  to  see  what  he  or  she  had  achieved  for  the  past  year,  and  also  focuses  ahead  for  more 
to  come. 

If,  by  any  chance,  I  am  invited  to  one  of  these  parties,  my  experience  is  always  different.  Instead 
of  dancing  aroimd  with  my  friends  while  the  music  is  booming,  I  get  carried  away  by  memories 
of  my  family  back  home  in  Kenya.  I  remember  the  good  times  I  had  with  my  brothers  and  sister 
while  we  did  our  daily  routine. 

Every  morning,  I  remember  we  went  to  the  shamba  (garden)  to  weed  our  crops.  I  remember  one 
day  arguing  with  my  brother  as  to  why  he  always  remained  behind  just  to  join  us  an  hour  later.  In 
his  defense,  he  said  that  he  preferred  waiting  for  breakfast  before  he  came  to  weed.  He  said,  "This 
is  why  I  always  work  more  hours  than  you  guys!" 

And  so,  to  prove  his  wrong  or  right,  we  decided  to  give  it  a  try.  One  day  we  went  to  work  as  usual 
without  breakfast,  and  recorded  the  time  we  could  work  before  getting  tired  and  stopping.  On 
the  next  day,  we  all  ate  breakfast  before  going  to  work.  We  recorded  how  long  we  worked  again 
before  getting  tired  and  stopping.  Of  interest  was  our  mean  increase  in  work  time.  Though  not 
sure,  my  brother  insisted  that  it  is  more  than  two  hours.  Using  the  data  below,  solve  our  problem. 
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Work  hoiirs  with  breakfast 

Work  hoiurs  without  breakfast 

8 

6 

7 

5 

9 

5 

5 

4 

9 

7 

8 

7 

10 

7 

7 

5 

6 

6 

9 

5 

Table  10.12 


10.9.2  Try  these  multiple  choice  questions. 

For  questions  Exercise  10.9.39  -  Exercise  10.9.40,  use  the  following  information. 

A  new  AIDS  prevention  drugs  was  tried  on  a  group  of  224  HIV  positive  patients.  Forty-five  (45)  patients 
developed  AIDS  after  four  years.  In  a  control  group  of  224  HIV  positive  patients,  68  developed  AIDS  after 
four  years.  We  want  to  test  whether  the  method  of  treatment  reduces  the  proportion  of  patients  that  develop 
AIDS  after  four  years  or  if  the  proportions  of  the  treated  group  and  the  untreated  group  stay  the  same. 

Let  the  subscript  t=  treated  patient  and  ut=  imtreated  patient. 

Exercise  10.9.39  (Solution  on  p.  469.) 

The  appropriate  h5^otheses  are: 

A.  Ho:pt<  Put  and  Ha  :  pt  >  Put 

B.  Hg-.  Pt  <  Put  and  Ha  :  pt  >  Put 

C.  Ho  -.pt  =  Put  and  Ha  :  pt  ^  Put 

D.  Ho-.pt^  Put  and  Ha  :  pt  <  Put 

Exercise  10.9.40  (Solution  on  p.  469.) 

If  the  p  -value  is  0.0062  what  is  the  conclusion  (use  a  —  0.05  )? 

A.  The  method  has  no  effect. 

B.  There  is  sufficient  evidence  to  conclude  that  the  method  reduces  the  proportion  of  HIV  positive 

patients  that  develop  AIDS  after  four  years. 

C.  There  is  sufficient  evidence  to  conclude  that  the  method  increases  the  proportion  of  HIV  posi- 

tive patients  that  develop  AIDS  after  four  years. 

D.  There  is  insufficient  evidence  to  conclude  that  the  method  reduces  the  proportion  of  HIV  pos- 

itive patients  that  develop  AIDS  after  four  years. 

Exercise  10.9.41  (Solution  on  p.  469.) 

Lesley  E.  Tan  investigated  the  relationship  between  left-handedness  and  right-handedness  and 
motor  competence  in  preschool  children.  Random  samples  of  41  left-handers  and  41  right-handers 
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were  given  several  tests  of  motor  skills  to  determine  if  there  is  evidence  of  a  difference  between  the 
children  based  on  this  experiment.  The  experiment  produced  the  means  and  standard  deviations 
shown  below.  Determine  the  appropriate  test  and  best  distribution  to  use  for  that  test. 


Left-handed 

Right-handed 

Sample  size 

41 

41 

Sample  mean 

97.5 

98.1 

Sample  standard  deviation 

17.5 

19.2 

Table  10.13 


A.  Two  independent  means,  normal  distribution 

B.  Two  independent  means,  student's-t  distribution 

C.  Matched  or  paired  samples,  student's-t  distribution 

D.  Two  population  proportions,  normal  distribution 

For  questions  Exercise  10.9.42  -  Exercise  10.9.43,  use  the  following  information. 

An  experiment  is  conducted  to  show  that  blood  pressujre  can  be  consciously  reduced  in  people  trained  in  a 
"biofeedback  exercise  program."  Six  (6)  subjects  were  randomly  selected  and  the  blood  pressure  measure- 
ments were  recorded  before  and  after  the  training.  The  difference  between  blood  pressures  was  calculated 
(after  —  before)  producing  the  following  results:  =  —10.2  —  8.4.  Using  the  data,  test  the  hypothesis 
that  the  blood  pressure  has  decreased  after  the  training. 

Exercise  10.9.42  (Solution  on  p.  469.) 

The  distribution  for  the  test  is 

A.  ts 

B.  te 

C.  N (-10.2,8.4) 

D.  N  (-10.2,  M) 

Exercise  10.9.43  (Solution  on  p.  469.) 

If  a  =  0.05,  the  p-value  and  the  conclusion  are 

A.  0.0014;  There  is  sufficient  evidence  to  conclude  that  the  blood  pressure  decreased  after  the 

training 

B.  0.0014;  There  is  sufficient  evidence  to  conclude  that  the  blood  pressure  increased  after  the  train- 

ing 

C.  0.0155;  There  is  sufficient  evidence  to  conclude  that  the  blood  pressure  decreased  after  the 

training 

D.  0.0155;  There  is  sufficient  evidence  to  conclude  that  the  blood  pressure  increased  after  the 

training 

For  questions  Exercise  10.9.44^  Exercise  10.9.45,  use  the  following  information. 

The  Eastern  and  Western  Major  League  Soccer  conferences  have  a  new  Reserve  Division  that  allows  new 
players  to  develop  their  skills.  Data  for  a  randomly  picked  date  showed  the  following  annual  goals. 
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Western 

Eastern 

Los  Angeles  9 

D.C.  United  9 

FC  Dallas  3 

Chicago  8 

Chivas  USA  4 

Columbus  7 

Real  Salt  Lake  3 

New  England  6 

Colorado  4 

MetroStars  5 

San  Jose  4 

Kansas  City  3 

Table  10.14 


Conduct  a  h5rpothesis  test  to  determine  if  the  Western  Reserve  Division  teams  score,  on  average,  fewer  goals 
than  the  Eastern  Reserve  Division  teams.  Subscripts:  1  Western  Reserve  Division  (W);  2  Eastern  Reserve 
Division  (E) 

Exercise  10.9.44  (Solution  on  p.  469.) 

The  exact  distribution  for  the  hypothesis  test  is: 

A.  The  normal  distribution. 

B.  The  student's-t  distribution. 

C.  The  uniform  distribution. 

D.  The  exponential  distribution. 


Exercise  10.9.45  (Solution  on  p.  469.) 

If  the  level  of  significance  is  0.05,  the  conclusion  is: 

A.  There  is  sufficient  evidence  to  conclude  that  the  W  Division  teams  score,  on  average,  fewer 

goals  than  the  E  teams. 

B.  There  is  insufficient  evidence  to  conclude  that  the  W  Division  teams  score,  on  average,  more 

goals  than  the  E  teams. 

C.  There  is  insufficient  evidence  to  conclude  that  the  W  teams  score,  on  average,  fewer  goals  than 

the  E  teams  score. 

D.  Unable  to  determine. 


Questions  Exercise  10.9.46  -  Exercise  10.9.48  refer  to  the  following. 

Neuroinvasive  West  Nile  virus  refers  to  a  severe  disease  that  affects  a  person's  nervous  system  .  It 
is  spread  by  the  Culex  species  of  mosquito.  In  the  United  States  in  2010  there  were  629  reported 
cases  of  neuroinvasive  West  Nile  virus  out  of  a  total  of  1021  reported  cases  and  there  were  486  neu- 
roinvasive reported  cases  out  of  a  total  of  712  cases  reported  in  2011.  Is  the  2011  proportion  of 
neuroinvasive  West  Nile  virus  cases  more  than  the  2010  proportion  of  neuroinvasive  West  Nile  virus 
cases?  Using  a  1%  level  of  significance,  conduct  an  appropriate  hj^othesis  test.  (Soiirce:  http:// 
http://www.cdc.gov/ncidod/dvbid/westnile/index.htm  ) 

•  "2011"  subscript:  2011  group. 

•  "2010"  subscript:  2010  group 

Exercise  10.9.46  (Solution  on  p.  469.) 

This  is: 

A.  a  test  of  two  proportions 

^^http:/ / cnx.org/ content/ml7023/latest/  http://www.cdc.gov/ncidod/ dvbid/westnile/index.htm 
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B.  a  test  of  two  independent  means 

C.  a  test  of  a  single  mean 

D.  a  test  of  matched  pairs. 

Exercise  10.9.47  (Solution  on  p.  469.) 

An  appropriate  nuR  hypothesis  is: 

A.  P2011  <  P2010 
P2011  >  P2010 

C.  /^2011  <  F2010 

D.  P2011  >  P2010 


Exercise  10.9.48  (Solution  on  p.  469.) 

The  p-value  is  0.0022.  At  a  1%  level  of  significance,  the  appropriate  conclusion  is 

A.  There  is  sufficient  evidence  to  conclude  that  the  proportion  of  people  in  the  United  States  in 

2011  that  got  neuroinvasive  West  Nile  disease  is  less  than  the  proportion  of  people  in  the 
United  States  in  2010  that  got  neuroinvasive  West  Nile  disease. 

B.  There  is  insufficient  evidence  to  conclude  that  the  proportion  of  people  in  the  United  States  in 

2011  that  got  neuroinvasive  West  Nile  disease  is  more  than  the  proportion  of  people  in  the 
United  States  in  2010  that  got  neuroinvasive  West  Nile  disease. 

C.  There  is  insufficient  evidence  to  conclude  that  the  proportion  of  people  in  the  United  States 

in  2011  that  got  neuroinvasive  West  Nile  disease  is  less  than  the  proportion  of  people  in  the 
United  States  in  2010  that  got  neuroinvasive  West  Nile  disease. 

D.  There  is  sufficient  evidence  to  conclude  that  the  proportion  of  people  in  the  United  States  in 

2011  that  got  neuroinvasive  West  Nile  disease  is  more  than  the  proportion  of  people  in  the 
United  States  in  2010  that  got  neuroinvasive  West  Nile  disease. 


Questions  Exercise  10.9.49  and  Exercise  10.9.50  refer  to  the  following: 

A  golf  instructor  is  interested  in  determining  if  her  new  technique  for  improving  players'  golf  scores  is 
effective.  She  takes  four  (4)  new  students.  She  records  their  18-holes  scores  before  learning  the  technique 
and  then  after  having  taken  her  class.  She  conducts  a  hypothesis  test.  The  data  are  as  follows. 


Player  1 

Player  2 

Player  3 

Player  4 

Mean  score  before  class 

83 

78 

93 

87 

Mean  score  after  class 

80 

80 

86 

86 

Table  10.15 


Exercise  10.9.49  (Solution  on  p.  469.) 

This  is: 

A.  a  test  of  two  independent  means 

B.  a  test  of  two  proportions 

C.  a  test  of  a  single  proportion 

D.  a  test  of  matched  pairs. 

Exercise  10.9.50  (Solution  on  p.  469.) 

The  correct  decision  is: 


A.  Reject  Ho 
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B.  Do  not  reject  Hg 
Questions  Exercise  10.9.51  and  Exercise  10.9.52  refer  to  the  following: 

Suppose  a  statistics  instructor  believes  that  there  is  no  significant  difference  between  the  mean  class  scores 
of  statistics  day  students  on  Exam  2  and  statistics  night  students  on  Exam  2.  She  takes  random  samples 
from  each  of  the  populations.  The  mean  and  standard  deviation  for  35  statistics  day  students  were  75.86 
and  16.91.  The  mean  and  standard  deviation  for  37  statistics  night  students  were  75.41  and  19.73.  The  "day" 
subscript  refers  to  the  statistics  day  students.  The  "night"  subscript  refers  to  the  statistics  night  students. 

Exercise  10.9.51  (Solution  on  p.  469.) 

An  appropriate  alternate  hypothesis  for  the  hypothesis  test  is: 

A.  }iday  >  Fnight 

Fday  <  Fnight 

Fday  ~  Fnight 
D.  Fday  7^  Fnight 

Exercise  10.9.52  (Solution  on  p.  469.) 

A  concluding  statement  is: 

A.  There  is  sufficient  evidence  to  conclude  that  statistics  night  students  mean  on  Exam  2  is  better 

than  the  statistics  day  students  mean  on  Exam  2. 

B.  There  is  insufficient  evidence  to  conclude  that  the  statistics  day  students  mean  on  Exam  2  is 

better  than  the  statistics  night  students  mean  on  Exam  2. 

C.  There  is  insufficient  evidence  to  conclude  that  there  is  a  significant  difference  between  the 

means  of  the  statistics  day  students  and  night  students  on  Exam  2. 

D.  There  is  sufficient  evidence  to  conclude  that  there  is  a  significant  difference  between  the  means 

of  the  statistics  day  students  and  night  students  on  Exam  2. 
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10.10  Review^' 

The  next  three  questions  refer  to  the  following  information: 
In  a  siurvey  at  Kirkwood  Ski  Resort  the  following  information  was  recorded: 

Sport  Participation  by  Age 


0-10 

11-20 

21-40 

40+ 

Ski 

10 

12 

30 

8 

Snowboard 

6 

17 

12 

5 

Table  10.16 


Suppose  that  one  person  from  of  the  above  was  randomly  selected. 

Exercise  10.10.1  (Solution  on  p.  469.) 

Find  the  probability  that  the  person  was  a  skier  or  was  age  11-20. 

Exercise  10.10.2  (Solution  on  p.  469.) 

Find  the  probability  that  the  person  was  a  snowboarder  given  he/ she  was  age  21  -  40. 

Exercise  10.10.3  (Solution  on  p.  469.) 

Explain  which  of  the  following  are  true  and  which  are  false. 

a.  Sport  and  Age  are  independent  events. 

b.  Ski  and  age  11  -  20  are  mutually  exclusive  events. 

c.  P  (Ski  and  age  21  -  40)  <  P  (Ski  |  age  21  -  40) 

d.  P  (Snowboard  or  age  0  —  10)  <  P  (Snowboard  |  age  0  —  10) 


Exercise  10.10.4  (Solution  on  p.  470.) 

The  average  length  of  time  a  person  with  a  broken  leg  wears  a  cast  is  approximately  6  weeks. 
The  standard  deviation  is  about  3  weeks.  Thirty  people  who  had  recently  healed  from  broken 
legs  were  interviewed.  State  the  distribution  that  most  accurately  reflects  total  time  to  heal  for  the 
thirty  people. 

Exercise  10.10.5  (Solution  on  p.  470.) 

The  distribution  for  X  is  Uniform.  What  can  we  say  for  certain  about  the  distribution  for  X  when 
n  =  l? 

A.  The  distribution  for  X  is  still  Uniform  with  the  same  mean  and  standard  dev.  as  the  distribution 

for  X.  _ 

B.  The  distribution  for  Xis  Normal  with  the  different  mean  and  a  different  standard  deviation  as 

the  distribution  for  X. 

C.  The  distribution  for  X  is  Normal  with  the  same  mean  but  a  larger  standard  deviation  than  the 

distribution  for  X. 

D.  The  distribution  for  X  is  Normal  with  the  same  mean  but  a  smaller  standard  deviation  than 

the  distribution  for  X. 


Exercise  10.10.6  (Solution  on  p.  470.) 

The  distribution  for  X  is  uniform.  What  can  we  say  for  certain  about  the  distribution  for  J^X 

when  n  =  50? 

This  content  is  available  online  at  <http://cnx.org/content/ml7021/1.9/>. 
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A.  The  distribution  for     Xis  still  iiniform  with  the  same  mean  and  standard  deviation  as  the 

distribution  for  X. 

B.  The  distribution  for  X]  X  is  Normal  with  the  same  mean  but  a  larger  standard  deviation  as  the 

distribution  for  X. 

C.  The  distribution  for  X]  X  is  Normal  with  a  larger  mean  and  a  larger  standard  deviation  than  the 

distribution  for  X. 

D.  The  distribution  for  X]    is  Normal  with  the  same  mean  but  a  smaller  standard  deviation  than 

the  distribution  for  X. 


The  next  three  questions  refer  to  the  following  information: 

A  group  of  students  measured  the  lengths  of  all  the  carrots  in  a  five-pound  bag  of  baby  carrots.  They 
calculated  the  average  length  of  baby  carrots  to  be  2.0  inches  with  a  standard  deviation  of  0.25  inches. 
Suppose  we  randomly  survey  16  five-poimd  bags  of  baby  carrots. 

Exercise  10.10.7  (Solution  on  p.  470.) 

State  the  approximate  distribution  for  X,  the  distribution  for  the  average  lengths  of  baby  carrots 
in  16  five-poujnd  bags.  X~ 

Exercise  10.10.8 

Explain  why  we  cannot  find  the  probability  that  one  individual  randomly  chosen  carrot  is  greater 
than  2.25  inches. 

Exercise  10.10.9  (Solution  on  p.  470.) 

Find  the  probability  that  x  is  between  2  and  2.25  inches. 

The  next  three  questions  refer  to  the  following  information: 

At  the  beginning  of  the  term,  the  amount  of  time  a  student  waits  in  line  at  the  campus  store  is  normally 
distributed  with  a  mean  of  5  minutes  and  a  standard  deviation  of  2  minutes. 

Exercise  10.10.10  (Solution  on  p.  470.) 

Find  the  90th  percentile  of  waiting  time  in  minutes. 

Exercise  10.10.11  (Solution  on  p.  470.) 

Find  the  median  waiting  time  for  one  student. 

Exercise  10.10.12  (Solution  on  p.  470.) 

Find  the  probability  that  the  average  waiting  time  for  40  students  is  at  least  4.5  minutes. 
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10.11  Lab:  Hypothesis  Testing  for  Two  Means  and  Two  Proportions 

Class  Time: 
Names: 

10.11.1  Student  Learning  Outcomes: 

•  The  student  will  select  the  appropriate  distributions  to  use  in  each  case. 

•  The  student  will  conduct  h5^othesis  tests  and  interpret  the  results. 


10.11.2  Supplies: 

•  The  business  section  from  two  consecutive  days'  newspapers 

•  3  small  packages  of  M&Ms® 

•  5  small  packages  of  Reese's  Pieces® 


10.11.3  Increasing  Stocks  Survey 

Look  at  yesterday's  newspaper  business  section.  Conduct  a  hypothesis  test  to  determine  if  the  proportion 

of  New  York  Stock  Exchange  (NYSE)  stocks  that  increased  is  greater  than  the  proportion  of  NASDAQ  stocks 
that  increased.  As  randomly  as  possible,  choose  40  NYSE  stocks  and  32  NASDAQ  stocks  and  complete  the 
following  statements. 


3.  In  words,  define  the  Random  Variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Calculate  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance, 
a.  Graph: 

i^This  content  is  available  online  at  <http:/ / cnx.org/content/ ml7022/1.13/>. 
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0 
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Figure  10.8 

b.  Calculate  the  p-value: 

7.  Do  you  reject  or  not  reject  the  null  hypothesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 

10.11.4  Decreasing  Stocks  Survey 

Randomly  pick  8  stocks  from  the  newspaper  Using  two  consecutive  days'  business  sections,  test  whether 
the  stocks  went  down,  on  average,  for  the  second  day. 

1.  Ho 

2.  Ha 

3.  In  words,  define  the  Random  Variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Calculate  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance, 
a.  Graph: 
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Figure  10.9 

b.  Calculate  the  p-value: 

7.  Do  you  reject  or  not  reject  the  null  hypothesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 

10.11.5  Candy  Survey 

Buy  three  small  packages  of  M&Ms  and  5  small  packages  of  Reese's  Pieces  (same  net  weight  as  the  M&Ms). 
Test  whether  or  not  the  mean  number  of  candy  pieces  per  package  is  the  same  for  the  two  brands. 

1.  Ho-. 

2.  Ha-. 

3.  In  words,  define  the  random  variable.  = 

4.  What  distribution  should  be  used  for  this  test? 

5.  Calculate  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance, 
a.  Graph: 
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Figure  10.10 

b.  Calculate  the  p-value: 

7.  Do  you  reject  or  not  reject  the  null  hypothesis?  Why? 

8.  Write  a  clear  conclusion  using  a  complete  sentence. 

10.11.6  Shoe  Survey 

Test  whether  women  have,  on  average,  more  pairs  of  shoes  than  men.  Include  all  forms  of  sneakers,  shoes, 
sandals,  and  boots.  Use  your  class  as  the  sample. 

1.  Ho 

2.  Ha 

3.  In  words,  define  the  Random  Variable.  = 

4.  The  distribution  to  use  for  the  test  is: 

5.  Calculate  the  test  statistic  using  your  data. 

6.  Draw  a  graph  and  label  it  appropriately.  Shade  the  actual  level  of  significance, 
a.  Graph: 
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b.  Calculate  the  p-value: 

Do  you  reject  or  not  reject  the  null  hypothesis?  Why? 
Write  a  clear  conclusion  using  a  complete  sentence. 
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Solutions  to  Exercises  in  Chapter  10 


Solution  to 

Example 

10.2,  Problem 

Kp. 

433) 

two  means 

Solution  to 

Example 

10.2,  Problem 

2  (p. 

433) 

unknown 

Solution  to 

Example 

10.2,  Problem 

3  (p. 

433) 

student's-t 

Solution  to 

Example 

10.2,  Problem 

4  (p. 

433) 

Xa  -  Xb 

Solution  to 

Example 

10.2,  Problem 

5  (p. 

433) 

•  Ho:}iA<liB 

•  Ha:jiA>HB 

Solution  to  Example  10.2,  Problem  6  (p.  433) 

right 

Solution  to  Example  10.2,  Problem  7  (p.  433) 

0.1928 

Solution  to  Example  10.2,  Problem  8  (p.  433) 
Do  not  reject. 

Solution  to  Example  10.4,  Problem  (p.  436) 
The  problem  asks  for  a  difference  in  proportions. 
Solution  to  Example  10.6,  Problem  (p.  440) 

means;  At  a  5%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence  to  conclude  that 
the  strength  development  class  helped  to  make  the  players  stronger,  on  average. 
Solution  to  Example  10.7,  Problem  (p.  441) 

Hq:  ji^i  equals  0;  Ha-  }id  does  not  equal  0;  Do  not  reject  the  null;  At  a  5%  significance  level,  from  the 
sample  data,  there  is  not  sufficient  evidence  to  conclude  that  the  mean  difference  in  distances  between  the 
children's  dominant  versus  weaker  hands  is  significant  (there  is  not  sufficient  evidence  to  show  that  the 
children  could  push  the  shot-put  further  with  their  dominant  hand).  Alpha  and  the  p-value  are  close  so  the 
test  is  not  strong. 

Solutions  to  Practice  1:  Hypothesis  Testing  for  Two  Proportions 

Solution  to  Exercise  10.7.1  (p.  443) 

Proportions 

Solution  to  Exercise  10.7.2  (p.  443) 
a.  Ho:pN=PND 

a.  Ha:pN>  PND 

Solution  to  Exercise  10.7.3  (p.  443) 

right-tailed 

Solution  to  Exercise  10.7.6  (p.  443) 

Normal 

Solution  to  Exercise  10.7.8  (p.  443) 
3.50 

Solution  to  Exercise  10.7.10  (p.  444) 

0.0002 

Solution  to  Exercise  10.7.11  (p.  444) 

a.  Reject  the  niill  hypothesis 
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Solutions  to  Practice  2:  Hypothesis  Testing  for  Two  Averages 


Solution  to  Exercise 

10.8.1 

(P- 

445) 

Means 

Solution  to  Exercise 

10.8.2 

(P- 

445) 

a.  Hq  :  fiw  =  jij^ 

b.  Ha  :  piw  Fnw 

Solution  to  Exercise 

10.8.3 

(P- 

445) 

two-tailed 

Solution  to  Exercise 

10.8.4 

(P- 

445) 

Xw  -  Xnw 

Solution  to  Exercise 

10.8.5 

(P- 

445) 

The  difference  between  the  mean  life  spans  of  whites  and  nonwhites. 
Solution  to  Exercise  10.8.8  (p.  445) 

5.42 

Solution  to  Exercise  10.8.10  (p.  446) 

0.0000 

Solution  to  Exercise  10.8.11  (p.  446) 

a.  Reject  the  null  h5^othesis 

Solutions  to  Homework 

Solution  to  Exercise  10.9.1  (p.  447) 

A 

Solution  to  Exercise  10.9.3  (p.  447) 
B 

Solution  to  Exercise  10.9.5  (p.  447) 

A 

Solution  to  Exercise  10.9.7  (p.  447) 

D 

Solution  to  Exercise  10.9.9  (p.  447) 

C 

Solution  to  Exercise  10.9.11  (p.  448) 

e.  -1.04 

f.  0.1519 

h.  Decision:  Do  not  reject  null 

Solution  to  Exercise  10.9.13  (p.  448) 

Standard  Normal 

e.  z  =  2.14 

f.  0.0163 

h.  Decision:  Reject  null  when  oc  —  0.05;  Do  not  reject  nuU  when  a.  —  0.01 
Solution  to  Exercise  10.9.15  (p.  449) 

e.  0.73 

f.  0.2326 

h.  Decision:  Do  not  reject  null 
Solution  to  Exercise  10.9.17  (p.  449) 
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e.  -7.33 

f.  0 

h.  Decision:  Reject  null 

Solution  to  Exercise  10.9.19  (p.  450) 

d.  t7 

e.  -1.51 

f.  0.1755 

h.  Decision:  Do  not  reject  null 
Solution  to  Exercise  10.9.21  (p.  450) 

d.  t<) 

e.  t  =  -1.86 

f.  0.0479 

h.  Decision:  Reject  null,  but  run  another  test 
Solution  to  Exercise  10.9.23  (p.  451) 

e.  t  =  -0.82 

f.  0.2066 

h.  Decision:  Do  not  reject  null 
Solution  to  Exercise  10.9.25  (p.  451) 

d.  tj 

e.  t  =  2.9850 

f.  0.0102 

h.  Decision:  Reject  null;  There  is  sufficient  evidence  to  conclude  that  the  mean  difference  is  more  than  2 
minutes. 

Solution  to  Exercise  10.9.27  (p.  452) 

e.  0.22 

f.  0.4133 

h.  Decision:  Do  not  reject  null 
Solution  to  Exercise  10.9.29  (p.  452) 

e.  z  =  2.50 

f.  0.0063 

h.  Decision:  Reject  null 

Solution  to  Exercise  10.9.31  (p.  453) 

e.  -4.82 

f.  0 

h.  Decision:  Reject  null 

Solution  to  Exercise  10.9.33  (p.  453) 

d-  ^20.32 

e.  -4.70 

f.  0.0001 

h.  Decision:  Reject  null 
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Solution  to  Exercise  10.9.35  (p.  453) 

d.  t40.94 

e.  -5.08 

f.  0 

h.  Decision:  Reject  null 

Solution  to  Exercise  10.9.37  (p.  453) 

e.  -0.9223 

f.  0.1782 

h.  Decision:  Do  not  reject  null 

Solution  to  Exercise  10.9.39  (p.  454) 

D 

Solution  to  Exercise  10.9.40  (p.  454) 
B 

Solution  to  Exercise  10.9.41  (p.  454) 

B 

Solution  to  Exercise  10.9.42  (p.  455) 

A 

Solution  to  Exercise  10.9.43  (p.  455) 
C 

Solution  to  Exercise  10.9.44  (p.  456) 

B 

Solution  to  Exercise  10.9.45  (p.  456) 

C 

Solution  to  Exercise  10.9.46  (p.  456) 

A 

Solution  to  Exercise  10.9.47  (p.  457) 

A 

Solution  to  Exercise  10.9.48  (p.  457) 

D 

Solution  to  Exercise  10.9.49  (p.  457) 
D 

Solution  to  Exercise  10.9.50  (p.  457) 

B 

Solution  to  Exercise  10.9.51  (p.  458) 

D 

Solution  to  Exercise  10.9.52  (p.  458) 
C 

Solutions  to  Review 
Solution  to  Exercise  10.10.1  (p.  459) 

77 
100 

Solution  to  Exercise  10.10.2  (p.  459) 

12 

42 

Solution  to  Exercise  10.10.3  (p.  459) 

a.  False 

b.  False 

c.  True 
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d.  False 

Solution  to  Exercise  10.10.4  (p.  459) 

N  (180, 16.43) 

Solution  to  Exercise  10.10.5  (p.  459) 

A 

Solution  to  Exercise  10.10.6  (p.  459) 

C 

Solution  to  Exercise  10.10.7  (p.  460) 

Solution  to  Exercise  10.10.9  (p.  460) 

0.5000 

Solution  to  Exercise  10.10.10  (p.  460) 
7.6 

Solution  to  Exercise  10.10.11  (p.  460) 

5 

Solution  to  Exercise  10.10.12  (p.  460) 

0.9431 
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Chapter  11 

The  Chi-Square  Distribution 


11.1  The  Chi-Square  Distribution^ 

11.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Interpret  the  chi-square  probability  distribution  as  the  sample  size  changes. 

•  Conduct  and  interpret  chi-square  goodness-of-fit  h5qDothesis  tests. 

•  Conduct  and  interpret  chi-square  test  of  independence  hypothesis  tests. 

•  Conduct  and  interpret  chi-square  homogeneity  h5^othesis  tests. 

•  Conduct  and  interpret  chi-square  single  variance  hj^othesis  tests. 


11.1.2  Introduction 

Have  you  ever  wondered  if  lottery  numbers  were  evenly  distributed  or  if  some  numbers  occurred  with  a 
greater  frequency?  How  about  if  the  tjrpes  of  movies  people  preferred  were  different  across  different  age 
groups?  What  about  if  a  coffee  machine  was  dispensing  approximately  the  same  amount  of  coffee  each 
time?  You  could  answer  these  questions  by  conducting  a  h3^othesis  test. 

You  will  now  study  a  new  distribution,  one  that  is  used  to  determine  the  answers  to  the  above  examples. 
This  distribution  is  called  the  Chi-square  distribution. 

In  this  chapter,  you  wUl  learn  the  three  major  applications  of  the  Chi-square  distribution: 

•  The  goodness-of-fit  test,  which  determines  if  data  fit  a  particular  distribution,  such  as  with  the  lottery 

example 

•  The  test  of  independence,  which  determines  if  events  are  independent,  such  as  with  the  movie  exam- 
ple 

•  The  test  of  a  single  variance,  which  tests  variability,  such  as  with  the  coffee  example 

NOTE:  Though  the  Chi-square  calculations  depend  on  calciilators  or  computers  for  most  of  the 
calculations,  there  is  a  table  available  (see  the  Table  of  Contents  15.  Tables).  TI-83+  and  TI-84 
calculator  instructions  are  included  in  the  text. 


^This  content  is  available  onKne  at  <http://cnx.org/content/ml7048/1.9/>. 
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11.1.3  Optional  Collaborative  Classroom  Activity 

Look  in  the  sports  section  of  a  newspaper  or  on  the  Internet  for  some  sports  data  (baseball  averages,  bas- 
ketball scores,  golf  tournament  scores,  football  odds,  swimming  times,  etc.).  Plot  a  histogram  and  a  boxplot 
using  your  data.  See  if  you  can  determine  a  probability  distribution  that  your  data  fits.  Have  a  discussion 
with  the  class  about  your  choice. 


11.2  Notation' 

The  notation  for  the  chi-square  distribution  is: 

2  2 

where  df  —  degrees  of  freedom  depend  on  how  chi-square  is  being  used.  (If  you  want  to  practice  calculat- 
ing chi-square  probabilities  then  use  df  —  n  —  1.  The  degrees  of  freedom  for  the  three  major  uses  are  each 
calculated  differently.) 

For  the  distribution,  the  population  mean  is  ^  —  df  and  the  population  standard  deviation  is  a  — 
The  random  variable  is  shown  as     but  may  be  any  upper  case  letter. 

The  random  variable  for  a  chi-square  distribution  with  k  degrees  of  freedom  is  the  svm  of  k  independent, 
squared  standard  normal  variables. 

;t2=(Zi)2+(Z2f +  ...  +  (Zfcf 


11.3  Facts  About  the  Chi-Square  Distribution^ 

1 .  The  curve  is  nonsymmetrical  and  skewed  to  the  right. 

2.  There  is  a  different  chi-square  curve  for  each  df. 

^This  content  is  available  onKne  at  <http://cnx.org/content/ml7052/1.6/>. 
^This  content  is  available  onUne  at  <http://cnx.org/content/ml7045/1.6/>. 
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df  =  2 


(a) 


Figure  11.1 


3.  The  test  statistic  for  any  test  is  always  greater  than  or  equal  to  zero. 

4.  When  df  >  90,  the  chi-square  curve  approximates  the  normal.  For  X  ~  xiooo      mean,  }i  =  df  =  1000 
and  the  standard  deviation,  a  =  sjl  -1000  =  44.7.  Therefore,  X     N  (1000,44.7),  approximately. 

5.  The  mean,  \i,  is  located  just  to  the  right  of  the  peak. 


Figure  11.2 


In  the  next  sections,  you  will  learn  about  four  different  applications  of  the  Chi-Square  Distribution.  These 
hypothesis  tests  are  almost  always  right-tailed  tests.  In  order  to  understand  why  the  tests  are  mostly  right- 
tailed,  you  will  need  to  look  carefully  at  the  actual  definition  of  the  test  statistic.  Think  about  the  following 
while  you  study  the  next  four  sections.  If  the  expected  and  observed  values  are  "far"  apart,  then  the  test 
statistic  will  be  "large"  and  we  will  reject  in  the  right  tail.  The  only  way  to  obtain  a  test  statistic  very  close  to 
zero,  would  be  if  the  observed  and  expected  values  are  very,  very  close  to  each  other.  A  left-tailed  test  could 
be  used  to  determine  if  the  fit  were  "too  good."  A  "too  good"  fit  might  occur  if  data  had  been  manipulated 
or  invented.  Think  about  the  implications  of  right-tailed  versus  left-tailed  hypothesis  tests  as  you  learn  the 
applications  of  the  Chi-Square  Distribution. 
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11.4  Goodness-of-Fit  Test* 

In  this  iype  of  hjrpothesis  test,  you  determine  whether  the  data  "fit"  a  particular  distribution  or  not.  For 
example,  you  may  suspect  your  unknown  data  fit  a  binomial  distribution.  You  use  a  chi-square  test  (mean- 
ing the  distribution  for  the  h5^othesis  test  is  chi-square)  to  determine  if  there  is  a  fit  or  not.  The  null 
and  the  alternate  hypotheses  for  this  test  may  be  written  in  sentences  or  may  be  stated  as  equations  or 
inequalities. 


The  test  statistic  for  a  goodness-of-fit  test  is: 


where: 


?^  (11.1) 

k  t 


•  O  =  observed  values  (data) 

•  E  -  expected  values  (from  theory) 

•  k  =  the  number  of  different  data  cells  or  categories 


The  observed  values  are  the  data  values  and  the  expected  values  are  the  values  you  would  expect  to  get 
if  the  null  hypothesis  were  true.  There  are  n  terms  of  the  form  • 

The  degrees  of  freedom  are  df  =  (number  of  categories  -  1). 


The  goodness-of-fit  test  is  almost  always  right  tailed.  If  the  observed  values  and  the  corresponding  ex- 
pected values  are  not  close  to  each  other,  then  the  test  statistic  can  get  very  large  and  wiU  be  way  out  in  the 
right  tail  of  the  chi-square  curve. 

NOTE:  The  expected  value  for  each  cell  needs  to  be  at  least  5  in  order  to  use  this  test. 

Example  11.1 

Absenteeism  of  college  students  from  math  classes  is  a  major  concern  to  math  instructors  because 
missing  class  appears  to  increase  the  drop  rate.  Suppose  that  a  study  was  done  to  determine  if  the 
actual  student  absenteeism  follows  faculty  perception.  The  faculty  expected  that  a  group  of  100 
students  would  miss  class  according  to  the  following  chart. 


Number  absences  per  term 

Expected  number  of  students 

0-2 

50 

3-5 

30 

6-8 

12 

9-11 

6 

12+ 

2 

Table  11.1 


A  random  survey  across  all  mathematics  courses  was  then  done  to  determine  the  actual  number 
(observed)  of  absences  in  a  course.  The  next  chart  displays  the  resiilt  of  that  survey. 

*This  content  is  available  online  at  <http://caTx.org/content/ml7192/1.8/>. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


475 


Number  absences  per  term 

Actual  number  of  students 

0-2 

35 

3-5 

40 

6-8 

20 

9-11 

1 

12+ 

4 

Table  11.2 


Detennine  the  null  and  alternate  h5^otheses  needed  to  conduct  a  goodness-of-fit  test. 

Hg'.  Student  absenteeism  fits  faculty  perception. 

The  alternate  hypothesis  is  the  opposite  of  the  nuU  hypothesis. 

Ha'.  Student  absenteeism  does  not  fit  faculty  perception. 
Problem  1 

Can  you  use  the  information  as  it  appears  in  the  charts  to  conduct  the  goodness-of-fit  test? 
Solution 

No.  Notice  that  the  expected  number  of  absences  for  the  "12+"  entry  is  less  than  5  (it  is  2). 
Combine  that  group  with  the  "9  - 11"  group  to  create  new  tables  where  the  number  of  students  for 
each  entry  are  at  least  5.  The  new  tables  are  below. 


Number  absences  per  term 

Expected  number  of  students 

0-2 

50 

3-5 

30 

6-8 

12 

9+ 

8 

Table  11.3 

Number  absences  per  term 

Actual  number  of  students 

0-2 

35 

3-5 

40 

6-8 

20 

9+ 

5 

Table  11.4 


Problem  2 

What  are  the  degrees  of  freedom  {df)7 
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Solution 

There  are  4  "cells"  or  categories  in  each  of  the  new  tables. 

df  —  number  of  cells  —  1  =  4  —  1  =  3 


Example  11.2 

Employers  particularly  want  to  know  which  days  of  the  week  employees  are  absent  in  a  five 
day  work  week.  Most  employers  would  like  to  believe  that  employees  are  absent  equally  dur- 
ing the  week.  Suppose  a  random  sample  of  60  managers  were  asked  on  which  day  of  the  week 
did  they  have  the  highest  number  of  employee  absences.  The  results  were  distributed  as  fol- 
lows: 

Day  of  the  Week  Employees  were  most  Absent 


Monday 

Tuesday 

Wednesday 

Thursday 

Friday 

Niraiber  of  Absences 

15 

12 

9 

9 

15 

Table  11.5 


Problem 

For  the  popiilation  of  employees,  do  the  days  for  the  highest  number  of  absences  occur  with  equal 
frequencies  during  a  five  day  work  week?  Test  at  a  5%  significance  level. 

Solution 

The  null  and  alternate  hj^otheses  are: 

•  Ho'.  The  absent  days  occur  with  equal  frequencies,  that  is,  they  fit  a  iiniform  distribution. 

•  Ha'.  The  absent  days  occur  with  unequal  frequencies,  that  is,  they  do  not  fit  a  ujniform  distri- 
bution. 

If  the  absent  days  occur  with  equal  frequencies,  then,  out  of  60  absent  days  (the  total  in  the  sample: 
15  +  12  +  9  +  9  +  15  =  60),  there  would  be  12  absences  on  Monday,  12  on  Tuesday,  12  on  Wednesday, 
12  on  Thursday,  and  12  on  Friday.  These  numbers  are  the  expected  (E)  values.  The  values  in  the 
table  are  the  observed  (O)  values  or  data. 

This  time,  calculate  the  test  statistic  by  hand.  Make  a  chart  with  the  following  headings  and  fill 
in  the  columns: 

•  Expected  (E)  values  (12, 12, 12, 12, 12) 

•  Observed  (O)  values  (15, 12, 9, 9, 15) 

•  (O  -  E) 

•  (O  - 


The  last  column  )  should  have  0.75,  0, 0.75,  0.75, 0.75. 

Now  add  (sum)  the  last  column.  Verify  that  the  sum  is  3.  This  is  the     test  statistic. 

To  find  the  p-value,  calculate  P  (x^  >  3) .  This  test  is  right-tailed. 

(Use  a  computer  or  calculator  to  find  the  p-value.  You  should  get  p-value  —  0.5578.) 
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The  dfs  are  the  number  of  cells  —  1=  5  —  1=4. 

TI-83+  and  TI-84:  Press  2nd  DISTR.  Arrow  down  to  A:^cdf.  Press  ENTER.  Enter  (3,10-99,4). 
Rounded  to  4  decimal  places,  you  should  see  0.5578  which  is  the  p-value. 

Next,  complete  a  graph  like  the  one  below  with  the  proper  labeling  and  shading.  (You  should 
shade  the  right  tail.) 


The  decision  is  to  not  reject  the  null  hjrpothesis. 

Conclusion:  At  a  5%  level  of  significance,  from  the  sample  data,  there  is  not  sufficient  evidence  to 
conclude  that  the  absent  days  do  not  occur  with  equal  frequencies. 

NOTE:  T1-83+  and  some  Tl-84  calculators  do  not  have  a  special  program  for  the  test  statistic  for  the 
goodness-of-fit  test.  The  next  example  (Example  11-3)  has  the  calculator  instructions.  The  newer 
Tl-84  calculators  have  m  STAT  TESTS  the  test  Chi2  GOF.  To  run  the  test,  put  the  observed  values 
(the  data)  into  a  first  list  and  the  expected  values  (the  values  you  expect  if  the  null  hypothesis  is 
true)  into  a  second  list.  Press  STAT  TESTS  and  Chi2  GOF.  Enter  the  list  names  for  the  Observed  list 
and  the  Expected  list.  Enter  the  degrees  of  freedom  and  press  calculate  or  draw.  Make  sure  you 
clear  any  lists  before  you  start.  See  below. 

NOTE:  To  Clear  Lists  in  the  calculators:  Go  into  STAT  EDIT  and  arrow  up  to  the  list  name  area  of 
the  particular  list.  Press  CLEAR  and  then  arrow  down.  The  list  will  be  cleared.  Or,  you  can  press 
STAT  and  press  4  (for  ClrList).  Enter  the  list  name  and  press  ENTER. 


Example  11.3 

One  study  indicates  that  the  number  of  televisions  that  American  families  have  is  distributed  (this 
is  the  given  distribution  for  the  American  population)  as  follows: 


Number  of  Televisions 

Percent 

0 

10 

1 

16 

2 

55 

3 

11 

over  3 

8 

Table  11.6 
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The  table  contains  expected  (E)  percents. 

A  random  sample  of  600  families  in  the  far  western  United  States  resulted  in  the  following  data: 


Number  of  Televisions 

Frequency 

0 

66 

1 

119 

2 

340 

3 

60 

over  3 

15 

Total  =  600 

Table  11.7 


The  table  contains  observed  (O)  frequency  values. 
Problem 

At  the  1%  significance  level,  does  it  appear  that  the  distribution  "number  of  televisions"  of  far 
western  United  States  families  is  different  from  the  distribution  for  the  American  popiilation  as  a 

whole? 

Solution 

This  problem  asks  you  to  test  whether  the  far  western  United  States  families  distribution  fits  the 
distribution  of  the  American  families.  This  test  is  always  right-tailed. 

The  first  table  contains  expected  percentages.  To  get  expected  (E)  frequencies,  multiply  the  per- 
centage by  600.  The  expected  frequencies  are: 


Number  of  Televisions 

Percent 

Expected  Frequency 

0 

10 

(0.10)  •  (600)  =  60 

1 

16 

(0.16)  ■  (600)  =  96 

2 

55 

(0.55)  •  (600)  =  330 

3 

11 

(0.11)  •  (600)  =  66 

over  3 

8 

(0.08)  •  (600)  =  48 

Table  11.8 


Therefore,  the  expected  frequencies  are  60, 96, 330,  66,  and  48.  In  the  Tl  calculators,  you  can  let  the 
calculator  do  the  math.  For  example,  instead  of  60,  enter  .10*600. 

Hg:  The  "number  of  televisions"  distribution  of  far  western  United  States  families  is  the  same  as 
the  "number  of  televisions"  distribution  of  the  American  popiilation. 

Ha'.  The  "number  of  televisions"  distribution  of  far  western  United  States  families  is  different  from 
the  "number  of  televisions"  distribution  of  the  American  population. 

Distribution  for  the  test:  xi  where  df  —  (the  number  of  cells)  —  1=  5  —  1=4. 

note:  df  7^  600  -  1 
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Calculate  the  test  statistic:      =  29.65 
Graph: 

p-value  =  0.000006  (almost  0) 


0  4  29.65 


Probability  statement:  p-value  =  P  (x^  >  29.65)  =  0.000006. 
Compare  a  and  the  p-value: 

•  a  =  0.01 

•  p-value  =  0.000006 

So,  a  >  p-value. 

Make  a  decision:  Since  a  >  p-value,  reject  Hg. 

This  means  you  reject  the  belief  that  the  distribution  for  the  far  western  states  is  the  same  as  that 
of  the  American  population  as  a  whole. 

Conclusion:  At  the  1%  significance  level,  from  the  data,  there  is  sufficient  evidence  to  conclude 
that  the  "number  of  televisions"  distribution  for  the  far  western  United  States  is  different  from  the 
"number  of  televisions"  distribution  for  the  American  population  as  a  whole. 

NOTE:  T1-83+  and  some  Tl-84  calculators:  Press  STAT  and  ENTER.  Make  sure  to  clear  lists  LI, 
L2,  and  L3  if  they  have  data  in  them  (see  the  note  at  the  end  of  Example  11-2).  Into  LI,  put 
the  observed  frequencies  66,  119,  349,  60,  15.  Into  L2,  put  the  expected  frequencies  .10+600, 
.  16*600,  .  55*600,  .  11*600,  .  08*600.  Arrow  over  to  list  L3  and  up  to  the  name  area  "L3".  Enter 
(Ll-L2)-2/L2  and  ENTER.  Press  2nd  QUIT.  Press  2nd  LIST  and  arrow  over  to  MATH.  Press  5.  You 
should  see  "sum"  (Enter  L3).  Rounded  to  2  decimal  places,  you  should  see  29.65.  Press  2nd 
DISTR.  Press  7  or  Arrow  down  to  7:;t2cdf  and  press  ENTER.  Enter  (29 . 65 ,  1E99 ,4) .  Rounded 
to  4  places,  you  should  see  5 . 77E-6  =  .  000006  (rounded  to  6  decimal  places)  which  is  the  p-value. 

The  newer  Tl-84  calculators  have  in  STAT  TESTS  the  test  Chi2  GOF.  To  run  the  test,  put  the 
observed  values  (the  data)  into  a  first  list  and  the  expected  values  (the  values  you  expect  if  the 
null  hypothesis  is  true)  into  a  second  list.  Press  STAT  TESTS  and  Chi2  GOF.  Enter  the  list  names 
for  the  Observed  list  and  the  Expected  list.  Enter  the  degrees  of  freedom  and  press  calculate  or 
draw.  Make  sure  you  clear  any  lists  before  you  start. 


Example  11.4 

Suppose  you  flip  two  coins  100  times.  The  results  are  20  HH,  27  HT,  30  TH,  and  23  TT.  Are  the 
coins  fair?  Test  at  a  5%  significance  level. 
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Solution 

This  problem  can  be  set  up  as  a  goodness-of-fit  problem.  The  sample  space  for  flipping  two  fair 
coins  is  {HH,  HT,  TH,  TT}.  Out  of  100  flips,  you  would  expect  25  HH,  25  HT,  25  TH,  and  25  TT. 
This  is  the  expected  distribution.  The  question,  "Are  the  coins  fair?"  is  the  same  as  saying,  "Does 
the  distribution  of  the  coins  (20  HH,  27  HT,  30  TH,  23  TT)  fit  the  expected  distribution?" 

Random  Variable:  Let  X  =  the  number  of  heads  in  one  flip  of  the  two  coins.  X  takes  on  the  value 
0, 1,  2.  (There  are  0, 1,  or  2  heads  in  the  flip  of  2  coins.)  Therefore,  the  number  of  cells  is  3.  Since 
X  =  the  number  of  heads,  the  observed  frequencies  are  20  (for  2  heads),  57  (for  1  head),  and  23  (for 
0  heads  or  both  tails).  The  expected  frequencies  are  25  (for  2  heads),  50  (for  1  head),  and  25  (for  0 
heads  or  both  tails).  This  test  is  right-tailed. 

Ho :  The  coins  are  fair. 

Ha :  The  coins  are  not  fair. 

Distribution  for  the  test:  Xi  where  rf/  =  3  —  1  =  2. 

Calculate  the  test  statistic:      =  2.14 

Graph: 

p-value  -  03430 


Probability  statement:  p-value  =  P  {x^  >  2.14)  =  0.3430 
Compare  a  and  the  p-value: 

•  a  =  0.05 

•  p-value  =  0.3430 

So,  a  <  p-value. 

Make  a  decision:  Since  a  <  p-value,  do  not  reject  Hg. 

Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  coins  are  not  fair. 

NOTE:  T1-83+  and  some  Tl-  84  calculators:  Press  STAT  and  ENTER.  Make  sure  you  clear  lists  LI,  L2, 
and  L3  if  they  have  data  in  them.  Into  LI,  put  the  observed  frequencies  20,  57,  23.  Into  L2,  put 
the  expected  frequencies  25,  50,  25.  Arrow  over  to  list  L3  and  up  to  the  name  area  "L3".  Enter 
(Ll-L2)-2/L2  and  ENTER.  Press  2nd  QUIT.  Press  2nd  LIST  and  arrow  over  to  MATH.  Press  5.  You 
should  see  "sum" .Enter  L3.  Rounded  to  2  decimal  places,  you  should  see  2 . 14.  Press  2nd  DISTR. 
Arrow  down  to  7 :  x'2.cdf  (or  press  7).  Press  ENTER.  Enter  2.14, 1E99 , 2) .  Rounded  to  4  places,  you 
should  see  .  3430  which  is  the  p-value. 

The  newer  Tl-84  calculators  have  in  STAT  TESTS  the  test  Chi2  GOF.  To  rim  the  test,  put  the 
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observed  values  (the  data)  into  a  first  list  and  the  expected  values  (the  values  you  expect  if  the 
null  h5rpothesis  is  true)  into  a  second  list.  Press  STAT  TESTS  and  Chi2  GOF.  Enter  the  list  names 
for  the  Observed  list  and  the  Expected  list.  Enter  the  degrees  of  freedom  and  press  calculate  or 
draw.  Make  sure  you  clear  any  lists  before  you  start. 


11.5  Test  of  Independence^ 

Tests  of  independence  involve  using  a  contingency  table  of  observed  (data)  values.  You  first  saw  a  contin- 
gency table  when  you  studied  probability  in  the  Probability  Topics  (Section  3.1)  chapter 

The  test  statistic  for  a  test  of  independence  is  similar  to  that  of  a  goodness-of-fit  test: 

(11.2, 

where: 

•  O  =  observed  values 

•  E  =  expected  values 

•  i  =  the  number  of  rows  in  the  table 

•  =  the  niraiber  of  colirains  in  the  table 

(O—E)^ 

There  are  i  ■  j  terms  of  the  form  — . 

A  test  of  independence  determines  whether  two  factors  are  independent  or  not.  You  first  encountered 
the  term  independence  in  Chapter  3.  As  a  review,  consider  the  following  example. 

NOTE:  The  expected  value  for  each  cell  needs  to  be  at  least  5  in  order  to  use  this  test. 

Example  11.5 

Suppose  A  =  a  speeding  violation  in  the  last  year  and  B  =  a  cell  phone  user  while  driving.  If  A  and 
B  are  independent  then  P  {A  AND  B)  =  P  {A)  P{B).  A  AND  B  is  the  event  that  a  driver  received 
a  speeding  violation  last  year  and  is  also  a  cell  phone  user  while  driving.  Suppose,  in  a  study  of 
drivers  who  received  speeding  violations  in  the  last  year  and  who  uses  cell  phones  while  driving, 
that  755  people  were  surveyed.  Out  of  the  755,  70  had  a  speeding  violation  and  685  did  not;  305 
were  cell  phone  users  while  driving  and  450  were  not. 

Let  y  =  expected  niraiber  of  drivers  that  use  a  cell  phone  while  driving  and  received  speeding 
violations. 

If  A  and  B  are  independent,  then  P  (A  AND  B)  =  P  (A)  P  (B).  By  substitution, 

_y_  —  70_  305 
755  ~  755  ■  755 

Solve  for  y  :  y  =         =  28.3 


^This  content  is  available  online  at  <http://cnx.org/content/ml7191 /1. 12/>. 
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About  28  people  from  the  sample  are  expected  to  be  cell  phone  users  while  driving  and  to  receive 
speeding  violations. 

In  a  test  of  independence,  we  state  the  null  and  alternate  hypotheses  in  words.  Since  the  con- 
tingency table  consists  of  two  factors,  the  null  h5^othesis  states  that  the  factors  are  independent 
and  the  alternate  hj^othesis  states  that  they  are  not  independent  (dependent).  If  we  do  a  test  of 

independence  using  the  example  above,  then  the  null  hypothesis  is: 

H(,:  Being  a  cell  phone  user  while  driving  and  receiving  a  speeding  violation  are  independent 
events. 

If  the  null  hypothesis  were  true,  we  would  expect  about  28  people  to  be  cell  phone  users  while 
driving  and  to  receive  a  speeding  violation. 

The  test  of  independence  is  always  right-tailed  because  of  the  calculation  of  the  test  statistic.  If 
the  expected  and  observed  values  are  not  close  together,  then  the  test  statistic  is  very  large  and 
way  out  in  the  right  tail  of  the  chi-square  curve,  like  goodness-of-fit. 

The  degrees  of  freedom  for  the  test  of  independence  are: 

df  =  (number  of  colirains  -  l)(number  of  rows  -  1) 

The  following  formula  calculates  the  expected  number  (E): 

p         (row  total)(columrL  total) 

total  number  surveyed 

Example  11.6 

In  a  volunteer  group,  adults  21  and  older  volunteer  from  one  to  nine  hours  each  week  to  spend 
time  with  a  disabled  senior  citizen.  The  program  recruits  among  community  college  students, 
foiir-year  college  students,  and  nonstudents.  The  following  table  is  a  sample  of  the  adult  volim- 
teers  and  the  niraiber  of  hours  they  volimteer  per  week. 

Number  of  Hours  Worked  Per  Week  by  Volunteer  Type  (Observed) 


Type  of  Volunteer 

1-3  Hours 

4-6  Hours 

7-9  Hours 

Row  Total 

Community  College  Students 

111 

96 

48 

255 

Four- Year  College  Students 

96 

133 

61 

290 

Nonstudents 

91 

150 

53 

294 

Column  Total 

298 

379 

162 

839 

Table  11.9:  The  table  contains  observed  (O)  values  (data). 


Problem 

Are  the  number  of  hours  volunteered  independent  of  the  type  of  volunteer? 
Solution 

The  observed  table  and  the  question  at  the  end  of  the  problem,  "Are  the  number  of  hours  vol- 
unteered independent  of  the  t5^e  of  volunteer?"  tell  you  this  is  a  test  of  independence.  The  two 
factors  are  number  of  hours  volunteered  and  type  of  volunteer.  This  test  is  always  right-tailed. 

Hq:  The  niraiber  of  hours  volimteered  is  independent  of  the  type  of  volunteer. 

Ha'.  The  number  of  hours  volunteered  is  dependent  on  the  type  of  volunteer. 
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The  expected  table  is: 

Number  of  Hours  Worked  Per  Week  by  Volunteer  Type  (Expected) 


Type  of  Volunteer 

1-3  Hours 

4-6  Hours 

7-9  Hours 

Community  College  Students 

90.57 

115.19 

49.24 

Four- Year  College  Students 

103.00 

131.00 

56.00 

Nonstudents 

104.42 

132.81 

56.77 

Table  11.10:  The  table  contains  expected  (£)  values  (data). 


For  example,  the  calculation  for  the  expected  frequency  for  the  top  left  cell  is 

p  _  (row  total)(coluinrL  total)  _  255-298  _  nn  c.n 
total  number  surveyed  839 

Calculate  the  test  statistic:      =  12.99       (calculator  or  computer) 
Distribution  for  the  test: 

df  =  (3  columns  -  1)  (3  rows  -  1)  =  (2)  (2)  =  4 
Graph: 


p-value  =  0.0113 


0  12.99 

Probability  statement:  p-value  =  V[-)^>  12.99)  =  0.0113 

Compare  a  and  the  p-value:  Since  no  a  is  given,  assume  a  =  0.05.  p-value  =  0.0113.  a  >  p-value. 

Make  a  decision:  Since  a  >  p-value,  reject  Hg.  This  means  that  the  factors  are  not  independent. 

Conclusion:  At  a  5%  level  of  significance,  from  the  data,  there  is  sufficient  evidence  to  conclude 
that  the  number  of  hours  volunteered  and  the  type  of  volunteer  are  dependent  on  one  another. 

For  the  above  example,  if  there  had  been  another  type  of  volunteer,  teenagers,  what  would  the 
degrees  of  freedom  be? 

NOTE:  Calculator  instructions  follow. 

TI-83+  and  TI-84  calculator:  Press  the  MATRX  key  and  arrow  over  to  EDIT.  Press  1 :  [A] .  Press  3 
ENTER  3  ENTER.  Enter  the  table  values  by  row  from  Example  11-6.  Press  ENTER  after  each.  Press 
2nd  QUIT.  Press  STAT  and  arrow  over  to  TESTS.  Arrow  down  to  C:;\;2-TEST.  Press  ENTER.  You 
should  see  Observed:  [A]  and  Expected:  [B]  .  Arrow  down  to  Calculate.  Press  ENTER.  The  test 
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statistic  is  12.9909  and  the  p-value  —  0.0113.  Do  the  procedure  a  second  time  but  arrow  down  to 
Draw  instead  of  calculate. 


Example  11.7 

De  Anza  College  is  interested  in  the  relationship  between  anxiety  level  and  the  need  to  succeed 
in  school.  A  random  sample  of  400  students  took  a  test  that  measiired  anxiety  level  and  need  to 
succeed  in  school.  The  table  shows  the  results.  De  Anza  College  wants  to  know  if  anxiety  level 
and  need  to  succeed  in  school  are  independent  events. 


Need  to  Succeed  in  School  vs.  Anxiety  Level 


Need  to 
Succeed  in 
School 

High 
Anxiety 

Med-high 
Anxiety 

Medium 
Anxiety 

Med-low 
Anxiety 

Low 
Anxiety 

Row  Total 

High  Need 

35 

42 

53 

15 

10 

155 

Medium 

Need 

18 

48 

63 

33 

31 

193 

Low  Need 

4 

5 

11 

15 

17 

52 

Coliimn  To- 
tal 

57 

95 

127 

63 

58 

400 

Table  11.11 


Problem  1 

How  many  high  anxiety  level  students  are  expected  to  have  a  high  need  to  succeed  in  school? 
Solution 

The  column  total  for  a  high  anxiety  level  is  57.  The  row  total  for  high  need  to  succeed  in  school  is 
155.  The  sample  size  or  total  siirveyed  is  400. 

p        (low  total)(column  total)         155-57        99  no 

^  ~         total  surveyed         ~    400    ~  '^'^■'^ 

The  expected  number  of  students  who  have  a  high  anxiety  level  and  a  high  need  to  succeed  in 
school  is  about  22. 


Problem  2 

If  the  two  variables  are  independent,  how  many  students  do  you  expect  to  have  a  low  need  to 
succeed  in  school  and  a  med-low  level  of  anxiety? 

Solution 

The  column  total  for  a  med-low  anxiety  level  is  63.  The  row  total  for  a  low  need  to  succeed  in 
school  is  52.  The  sample  size  or  total  surveyed  is  400. 

Problem  3 

P       (row  total)(column  total)  _ 

total  surveyed 

b.  The  expected  number  of  students  who  have  a  med-low  anxiety  level  and  a  low  need  to  succeed 
in  school  is  about: 
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11.6  The  Chi-Square  Distribution:  Test  for  Homogeneity^ 

The  Goodness  of  Fit  test  can  be  used  to  decide  whether  a  population  fits  a  given  distribution,  but  the  Good- 
ness of  Fit  test  will  not  suffice  to  compare  whether  two  populations  follow  the  same  unknown  distribution. 
A  different  test,  called  the  Test  for  Homogeneity,  can  be  used  to  make  a  conclusion  about  whether  two 
populations  have  the  same  distribution.  To  calculate  the  test  statistic  for  a  test  for  homogeneity,  follow  the 
same  procedure  as  with  the  Test  of  Independence. 

NOTE:  The  expected  value  for  each  cell  needs  to  be  at  least  5  in  order  to  use  this  test. 

Hypotheses 

Ho'.  The  distributions  of  the  two  populations  are  the  same. 

Ha-  The  distributions  of  the  two  populations  are  not  the  same. 
Test  Statistic 

Use  a     test  statistic.  It  is  computed  in  the  same  way  as  the  test  for  independence. 

Degrees  of  Freedom  (df) 
df  =  niraiber  of  columns  -  1 

Requirements 

All  values  in  the  table  must  be  greater  than  or  equal  to  5. 
Common  Uses 

Comparing  two  populations.  For  example:  men  versus  women,  before  vs.  after,  east  vs.  west.  The  variable 
is  categorical  with  more  than  two  possible  response  values. 

Example  11.8 

Do  male  and  female  college  students  have  the  same  distribution  of  living  conditions?  Use  a  level  of 
significance  of  0.05.  Suppose  that  250  randomly  selected  male  college  students  and  300  randomly 
selected  female  college  students  were  asked  about  their  living  conditions:  Dormitory,  Apartment, 
With  Parents,  Other.  The  results  are  shown  in  the  table  below. 

Distribution  of  Living  Conditions  for  College  Males  and  College  Females 


Dormitory 

Apartment 

With  Parents 

Other 

Males 

72 

84 

49 

45 

Females 

91 

86 

88 

35 

Table  11.12 

*This  content  is  available  online  at  <http://cnx.org/content/m43655/1.2/>. 
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Problem 

Do  male  and  female  coUege  students  have  the  same  distribution  of  living  conditions? 
Solution 

Hg-.  The  distribution  of  living  conditions  for  male  coUege  students  is  the  same  as  the  distribution 
of  living  conditions  for  female  college  students. 

Ha.  The  distribution  of  living  conditions  for  male  college  students  is  not  the  same  as  the 
distribution  of  living  conditions  for  female  college  students. 

Degrees  of  Freedom  (df): 

df  =  number  of  columns  -1  =  4-1  =  3 

Distribution  for  the  test:  X3 

Calculate  the  test  statistic:     —  10.1287      (calculator  or  computer) 
Probability  statement:  p-value  ^  P  (x^  >  10.1287)  =  0.0175 

T1-83+  and  Tl-84  calculator:  Press  the  MATRX  key  and  arrow  over  to  EDIT.  Press  1 :  [A] .  Press 
2  ENTER  4  ENTER.  Enter  the  table  values  by  row.  Press  ENTER  after  each.  Press  2nd  QUIT. 
Press  STAT  and  arrow  over  to  TESTS.  Arrow  down  to  C:;\;2-TEST.  Press  ENTER.  You  should  see 
Observed:  [A]  and  Expected:  [B] .  Arrow  down  to  Calculate.  Press  ENTER.  The  test  statistic 
is  10.1287  and  the  p-value  —  0.0175.  Do  the  procedure  a  second  time  but  arrow  down  to  Draw 
instead  of  calculate. 

Compare  a  and  the  p-value:  Since  no  oc  is  given,  assume  a  —  0.05.  p-value  —  0.0175. 
a  >  p-value. 

Make  a  decision:  Since  a.  >  p-value,  reject  Hg.  This  means  that  the  distributions  are  not 
the  same. 

Conclusion:  At  a  5%  level  of  significance,  from  the  data,  there  is  sufficient  evidence  to  con- 
clude that  the  distributions  of  living  conditions  for  male  and  female  college  students  are  not  the 
same. 

Notice  that  the  conclusion  is  only  that  the  distributions  are  not  the  same.  We  cannot  use 
the  Test  for  Homogeneity  to  make  any  conclusions  about  how  they  differ. 


Example  11.9 

Both  before  and  after  a  recent  earthquake,  surveys  were  conducted  asking  voters  which  of  the 
three  candidates  they  planned  on  voting  for  in  the  upcoming  city  council  election.  Has  there  been 
a  change  since  the  earthquake?  Use  a  level  of  significance  of  0.05.  The  table  below  shows  the 
results  of  the  survey. 


Perez 

Chung 

Stevens 

Before 

167 

128 

135 

After 

214 

197 

225 
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Table  11.13 

Problem 

Has  there  been  a  change  in  the  distribution  of  voter  preferences  since  the  earthquake? 
Solution 

H(,:  The  distribution  of  voter  preferences  was  the  same  before  and  after  the  earthquake. 

Ha'.  The  distribution  of  voter  preferences  was  not  the  same  before  and  after  the  earthquake. 

Degrees  of  Freedom  (df): 

df  =  number  of  columns  -1  =  3-1  =  2 

Distribution  for  the  test:  X2 

Calculate  the  test  statistic:     —  3.2603      (calciilator  or  computer) 
Probability  statement:  p-value  =  P  (x^  >  3.2603)  =  0.1959 

TI-83+  and  TI-84  calculator:  Press  the  MATRX  key  and  arrow  over  to  EDIT.  Press  1 :  [A] .  Press 
2  ENTER  3  ENTER.  Enter  the  table  values  by  row.  Press  ENTER  after  each.  Press  2nd  QUIT. 
Press  STAT  and  arrow  over  to  TESTS.  Arrow  down  to  C-.x'^-TESl.  Press  ENTER.  You  should  see 
Observed:  [A]  and  Expected:  [B] .  Arrow  down  to  Calculate.  Press  ENTER.  The  test  statistic  is 
3.2603  and  the  p-value  —  0.1959.  Do  the  procedure  a  second  time  but  arrow  down  to  Draw  instead 
of  calculate. 

Compare  a  and  the  p-value:  a  =  0.05  and  the  p-value  —  0.1959.  a.  <  p-value. 
Make  a  decision:  Since  a  <  p-value,  do  not  reject  Hq. 

Conclusion:  At  a  5%  level  of  significance,  from  the  data,  there  is  insuifident  evidence  to 
conclude  that  the  distribution  of  voter  preferences  was  not  the  same  before  and  after  the 
earthquake. 


Contributed  by  Dr.  Larry  Green 
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CHAPTER  1 1 .  THE  CHI-SQUARE  DISTRIBUTION 


11.7  The  Chi-Square  Distribution:  Comparison  Summary  of  the  Chi- 
Square  Tests  Goodness-of-Fit,  Independence  and  Homogeneity^ 

Comparison  Summary  of  the  Chi-Square  Tests:  Goodness-of-Fit,  Independence  and  Homogeneity 

You  have  seen  the  test  statistic  used  in  three  different  circumstances.  Below  is  a  summary  that  will  help 
you  decide  which     test  is  the  appropriate  one  to  use. 


•  Goodness-of-Fit:  Use  the  Goodness-of-Fit  Test  to  decide  whether  a  population  with  unknown  distri- 
bution "fits"  a  known  distribution.  In  this  case  there  will  be  a  single  qualitative  survey  question  or  a 
single  outcome  of  an  experiment  from  a  single  population.  Goodness-of-Fit  is  t)^ically  used  to  see 
if  the  population  is  uniform  (all  outcomes  occur  with  equal  frequency),  the  population  is  normal,  or 
the  population  is  the  same  as  another  population  with  known  distribution.  The  nuU  and  alternative 
hypotheses  are: 

Hg:  The  population  fits  the  given  distribution. 

Ha:  The  population  does  not  fit  the  given  distribution. 

•  Independence:  Use  the  Test  for  Independence  to  decide  whether  two  variables  (factors)  are  indepen- 
dent or  dependent.  In  this  case  there  will  be  two  qualitative  survey  questions  or  experiments  and  a 
contingency  table  will  be  constructed.  The  goal  is  to  see  if  the  two  variables  are  unrelated  (indepen- 
dent) or  related  (dependent).  The  null  and  alternative  hypotheses  are: 

Ho  :The  two  variables  (factors)  are  independent. 
Hfl:The  two  variables  (factors)  are  dependent. 

•  Homogeneity:  Use  the  Test  for  Homogeneity  to  decide  if  two  populations  with  unknown  distribution 
have  the  same  distribution  as  each  other.  In  this  case  there  will  be  a  single  qualitative  survey  question 
or  experiment  given  to  two  different  populations.  The  nuU  and  alternative  hypotheses  are: 

He,: The  two  populations  follow  the  same  distribution. 
Hfl:The  two  populations  have  different  distributions. 

'With  contributions  by  Dr.  Larry  Green 


11.8  Test  of  a  Single  Variance^ 


A  test  of  a  single  variance  assumes  that  the  underlying  distribution  is  normal.  The  null  and  alternate 
h5rpotheses  are  stated  in  terms  of  the  population  variance  (or  population  standard  deviation).  The  test 
statistic  is: 

(11.3, 

where: 

•  n  =  the  total  number  of  data 

•  s-^  =  sample  variance 

•  =  population  variance 

You  may  think  of  s  as  the  random  variable  in  this  test.  The  degrees  of  freedom  are  df  =  n  —  1. 
A  test  of  a  single  variance  may  be  right-tailed,  left-tailed,  or  two-tailed. 


^This  content  is  available  online  at  <http://cnx.org/content/m43654/1.2/>. 
*This  content  is  available  online  at  <http://cnx.org/content/ml7059/1.7/>. 
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The  following  example  will  show  you  how  to  set  up  the  null  and  alternate  hypotheses.  The  null  and 
alternate  hypotheses  contain  statements  about  the  population  variance. 

Example  11.10 

Math  instructors  are  not  only  interested  in  how  their  students  do  on  exams,  on  average,  but  how 
the  exam  scores  vary.  To  many  instructors,  the  variance  (or  standard  deviation)  may  be  more 
important  than  the  average. 

Suppose  a  math  instructor  believes  that  the  standard  deviation  for  his  final  exam  is  5  points.  One 
of  his  best  students  thinks  otherwise.  The  student  claims  that  the  standard  deviation  is  more  than 
5  points.  If  the  student  were  to  conduct  a  hypothesis  test,  what  would  the  nuU  and  alternate 
hypotheses  be? 

Solution 

Even  though  we  are  given  the  population  standard  deviation,  we  can  set  the  test  up  using  the 
population  variance  as  follows. 

•  Ho-,  cr^  =  52 

•  Ha-,      >  52 


Example  11.11 

With  individual  lines  at  its  various  windows,  a  post  office  finds  that  the  standard  deviation  for 
normally  distributed  waiting  times  for  customers  on  Friday  afternoon  is  7.2  minutes.  The  post 
office  experiments  with  a  single  main  waiting  line  and  finds  that  for  a  random  sample  of  25  cus- 
tomers, the  waiting  times  for  customers  have  a  standard  deviation  of  3.5  minutes. 

With  a  significance  level  of  5%,  test  the  claim  that  a  single  line  causes  lower  variation  among 
waiting  times  (shorter  waiting  times)  for  customers. 

Solution 

Since  the  claim  is  that  a  single  line  causes  lower  variation,  this  is  a  test  of  a  single  variance.  The 
parameter  is  the  population  variance,  cr^,  or  the  popiilation  standard  deviation,  a. 

Random  Variable:  The  sample  standard  deviation,  s,  is  the  random  variable.  Let  s  =  standard 
deviation  for  the  waiting  times. 

•  Ho:cr^  =  7.2^ 

•  Ha:cr^<7.2^ 

The  word  "lower"  tells  you  this  is  a  left-tailed  test. 
Distribution  for  the  test:  where: 

•  n  =  the  number  of  customers  sampled 

•  df  =  n-l  =  25-l=24 

Calculate  the  test  statistic: 

2  _  (n-l)-s^  _  (25-1) -a-S^  _  . 
X    -  -        7.22        -  ^-0/ 

where  n  =  25,  s  =  3.5,  and  a  —  7.2. 
Graph: 
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CHAPTER  11.  THE  CHI-SQUARE  DISTRIBUTION 


Probability  statement:  p-value  =  P  {x^  <  5.67)  =  0.000042 

Compare  a  and  the  p-value:  a  =  0.05       p-value  =  0.000042       a  >  p-value 

Make  a  decision:  Since  a  >  p-value,  reject  Hg. 

This  means  that  you  reject  =  7.2^.  In  other  words,  you  do  not  think  the  variation  in  waiting 
times  is  7.2  minutes,  but  lower. 

Conclusion:  At  a  5%  level  of  significance,  from  the  data,  there  is  sufficient  evidence  to  conclude 
that  a  single  line  causes  a  lower  variation  among  the  waiting  times  or  with  a  single  line,  the  cus- 
tomer waiting  times  vary  less  than  7.2  minutes. 

TI-83+  and  TI-84  calculators:  In  2nd  DISTR,  use  7  :x2cdf .  The  S3mtax  is  (lower,  upper,  df )  for 
the  parameter  list.  For  Example  11-9,  ;\;2cdf  (-  1E99 , 5 . 67 , 24) .  The  p-value  =  0.000042. 
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11.9  Summary  of  Formulas' 

The  Chi-Square  Probability  Distribution 

fi  =  df  and  a  =  V2  ■  df 
Goodness-of-Fit  Hypothesis  Test 

•  Use  goodness-of-fit  to  test  whether  a  data  set  fits  a  particular  probabiHty  distribution. 

•  The  degrees  of  freedom  are  number  of  cells  or  categories  -  1. 

•  The  test  statistic  is  £  ,  where  O  =  observed  values  (data),  E  =  expected  values  (from  theory), 

k 

and  k  =  the  number  of  different  data  cells  or  categories. 

•  The  test  is  right-tailed. 

Test  of  Independence 

•  Use  the  test  of  independence  to  test  whether  two  factors  are  independent  or  not. 

•  The  degrees  of  freedom  are  equal  to  (number  of  columns  -  l)(number  of  rows  - 1). 

/'Q_£\2 

•  The  test  statistic  is  E  — where  O  -  observed  values,  E  =  expected  values,  i  =  the  number  of  rows 

in  the  table,  and =  the  number  of  columns  in  the  table. 

•  The  test  is  right-tailed. 

•  If  the  null  hypothesis  is  true,  the  expected  number  E  =       totSeyeV" • 
Test  of  Homogeneity 

•  Use  the  test  for  homogeneity  to  decide  if  two  popiilations  with  imknown  distributions  have  the  same 
distribution  as  each  other. 

•  The  degrees  of  freedom  are  equal  to  niraiber  of  columns  -  1. 

(O—E)^ 

•  The  test  statistic  is  E  — p-^  where  O  =  observed  values,  E  =  expected  values,  i  =  the  number  of  rows 

(i-j) 

in  the  table,  and  /'  =  the  number  of  columns  in  the  table. 

•  The  test  is  right-tailed. 

•  If  the  null  hypothesis  is  true,  the  expected  number  E  =  ^'""^  tXSy  • 

NOTE:  The  expected  value  for  each  cell  needs  to  be  at  least  5  in  order  to  use  the  Goodness-of-Fit, 
Independence  and  Homogeneity  tests. 

Test  of  a  Single  Variance 

•  Use  the  test  to  determine  variation. 

•  The  degrees  of  freedom  are  the  niraiber  of  samples  -  1. 

•  The  test  statistic  is  ^""^2^'"  ,  where  n  =  the  total  number  of  data,     =  sample  variance,  and  cr^  = 
population  variance. 

•  The  test  may  be  left,  right,  or  two-tailed. 


'This  content  is  available  online  at  <http://cnx.org/content/ml7058/1.8/>. 
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11.10  Practice  1:  Goodness-of-Fit  Test  " 

11.10.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  goodness-of-fit  test. 

11.10.2  Given 

The  following  data  are  real.  The  cumulative  nirmber  of  AIDS  cases  reported  for  Santa  Clara  County  is 
broken  down  by  ethnicity  as  follows:  (Source:  HIV/AIDS  Epideiniology  Santa  Clara  County,  Santa  Clara 
County  Public  Health  Department,  May  201 1 ) 


Ethnicity 

Number  of  Cases 

White 

2229 

Hispanic 

1157 

Black/ African- American 

457 

Asian,  Pacific  Islander 

232 

Total  =  4075 

Table  11.14 

The  percentage  of  each  ethnic  group  in  Santa  Clara  County  is  as  follows: 


Ethnicity 

Percentage  of  total  county  pop- 
ulation 

Number  expected  (round  to  2 
decimal  places) 

White 

42.9% 

1748.18 

Hispanic 

26.7% 

Black/  African- American 

2.6% 

Asian,  Pacific  Islander 

27.8% 

Total  =  100% 

Table  11.15 


11.10.3  Expected  Results 

If  the  ethnicity  of  AIDS  victims  followed  the  ethnicity  of  the  total  county  population,  fill  in  the  expected 
number  of  cases  per  ethnic  group. 

11.10.4  Goodness-of-Fit  Test 

Perform  a  goodness-of-fit  test  to  determine  whether  the  make-up  of  AIDS  cases  foUows  the  ethnicity  of  the 
general  popiilation  of  Santa  Clara  County. 

Exercise  11.10.1 

Ho  : 

^"This  content  is  available  online  at  <http://cnx.org/content/ml7054/1.12/>. 
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Exercise  11.10.2 

Ha-. 


Exercise  11.10.3 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test? 

Exercise  11.10.4 
degrees  of  freedom  = 

Exercise  11.10.5 

Chi^  test  statistic  = 


(Solution  on  p.  518.) 


(Solution  on  p.  518.) 


Exercise  11.10.6 


(Solution  on  p.  518.) 


p-value  = 
Exercise  11.10.7 

Graph  the  situation.  Label  and  scale  the  horizontal  axis.  Mark  the  mean  and  test  statistic.  Shade 
in  the  region  corresponding  to  the  p-value. 


Let  oi  =  0.05 
Decision: 

Reason  for  the  Decision: 

Conclusion  (write  out  in  complete  sentences): 


11.10.5  Discussion  Question 

Exercise  11.10.8 

Does  it  appear  that  the  pattern  of  AIDS  cases  in  Santa  Clara  Coimty  corresponds  to  the  distribu- 
tion of  ethnic  groups  in  this  county?  Why  or  why  not? 
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11.11  Practice  2:  Contingency  Tables" 

11.11.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  test  for  independence  using  contingency  tables. 

Conduct  a  hypothesis  test  to  determine  if  smoking  level  and  ethnicity  are  independent. 

11.11.2  Collect  the  Data 

Copy  the  data  provided  in  Probability  Topics  Practice  1:  Contingency  Tables  into  the  table  below. 


Smoking  Levels  by  Ethnicity  (Observed) 


Smoking 
Level  Per 
Day 

African 
American 

Native 
Hawaiian 

Latino 

Japanese 
Americans 

White 

TOTALS 

1-10 

11-20 

21-30 

31+ 

TOTALS 

Table  11.16 


11.11.3  Hypothesis 

State  the  hypotheses. 

•  Ho-. 

•  Ha: 

11.11.4  Expected  Values 

Enter  expected  values  in  the  above  below.  Roimd  to  two  decimal  places. 

(Solution  on  p.  518.) 
(Solution  on  p.  518.) 
(Solution  on  p.  518.) 
(Solution  on  p.  518.) 

This  content  is  available  online  at  <http://cnx.Org/content/ml7056/l. 13/>. 
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11.11.5  Analyze  the  Data 

Calculate  the  following  values: 

Exercise  11.11.1 
Degrees  of  freedom  = 

Exercise  11.11.2 

Chi^  test  statistic  = 

Exercise  11.11.3 

p-value  = 

Exercise  11.11.4 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test?  Explain  why. 
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11.11.6  Graph  the  Data 
Exercise  11.11.5 

Graph  the  situation.  Label  and  scale  the  horizontal  axis.  Mark  the  mean  and  test  statistic.  Shade 
in  the  region  corresponding  to  the  p-value. 


11.11.7  Conclusions 

State  the  decision  and  conclusion  (in  a  complete  sentence)  for  the  following  preconceived  levels  of  a  . 

Exercise  11.11.6  (Solution  on  p.  518.) 

DC  =  0.05 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 

Exercise  11.11.7 

oc  =  0.01 

a.  Decision: 

b.  Reason  for  the  decision: 

c.  Conclusion  (write  out  in  a  complete  sentence): 
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11.12  Practice  3:  Test  of  a  Single  Variance^^ 

11.12.1  Student  Learning  Outcomes 

•  The  student  will  conduct  a  test  of  a  single  variance. 


11.12.2  Given 

Suppose  an  airline  claims  that  its  flights  are  consistently  on  time  with  an  average  delay  of  at  most  15  min- 
utes. It  claims  that  the  average  delay  is  so  consistent  that  the  variance  is  no  more  than  150  minutes.  Doubt- 
ing the  consistency  part  of  the  claim,  a  disgrimfled  traveler  calculates  the  delays  for  his  next  25  flights.  The 
average  delay  for  those  25  flights  is  22  minutes  with  a  standard  deviation  of  15  minutes. 


11.12.3  Sample  Variance 

Exercise  11.12.1 

Is  the  traveler  disputing  the  claim  about  the  average  or  about  the  variance? 

Exercise  11.12.2  (Solution  on  p.  518.) 
A  sample  standard  deviation  of  15  minutes  is  the  same  as  a  sample  variance  of  min- 
utes. 

Exercise  11.12.3 

Is  this  a  right-tailed,  left-tailed,  or  two-tailed  test? 


11.12.4  Hypothesis  Test 

Perform  a  hypothesis  test  on  the  consistency  part  of  the  claim. 

Exercise  11.12.4 
Ho-. 

Exercise  11.12.5 

Ha  : 

Exercise  11.12.6  (Solution  on  p.  518.) 

Degrees  of  freedom  = 

Exercise  11.12.7  (Solution  on  p.  518.) 

Chi^  test  statistic  = 

Exercise  11.12.8  (Solution  on  p.  518.) 

p-value  = 

Exercise  11.12.9 

Graph  the  situation.  Label  and  scale  the  horizontal  axis.  Mark  the  mean  and  test  statistic.  Shade 
the  p-value. 

i^This  content  is  available  onKne  at  <http:/ / cnx.org/content/ ml7053/1.8/ >. 
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Exercise  11.12.10 

Let  K  =  0.05 

Decision: 

Conclusion  (write  out  in  a  complete  sentence): 


11.12.5  Discussion  Questions 
Exercise  11.12.11 

How  did  you  know  to  test  the  variance  instead  of  the  mean? 
Exercise  11.12.12 

If  an  additional  test  were  done  on  the  claim  of  the  average  delay,  which  distribution  would  you 
use? 

Exercise  11.12.13 

If  an  additional  test  was  done  on  the  claim  of  the  average  delay,  but  45  flights  were  surveyed, 
which  distribution  would  you  use? 
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11.13  Homework" 

Exercise  11.13.1 

a.  Explain  why  the  "goodness  of  fit"  test  and  the  "test  for  independence"  are  generally  right  tailed 

tests. 

b.  If  you  did  a  left-tailed  test,  what  would  you  be  testing? 


11.13.1  Word  Problems 

For  each  word  problem,  use  a  solution  sheet  to  solve  the  h5^othesis  test  problem.  Go  to  The  Table  of 
Contents  14.  Appendix  for  the  chi-square  solution  sheet.  Roimd  expected  frequency  to  two  decimal  places. 

Exercise  11.13.2 

A  6-sided  die  is  rolled  120  times.  Fill  in  the  expected  frequency  column.  Then,  conduct  a  h5^oth- 
esis  test  to  determine  if  the  die  is  fair.  The  data  below  are  the  result  of  the  120  roUs. 


Face  Value 

Frequency 

Expected  Frequency 

1 

15 

2 

29 

3 

16 

4 

15 

5 

30 

6 

15 

Table  11.17 


Exercise  11.13.3  (Solution  on  p.  518.) 

The  marital  status  distribution  of  the  U.S.  male  population,  age  15  and  older,  is  as  shown  below. 
{Source:  U.S.  Census  Bureau,  Current  Population  Reports) 


Marital  Status 

Percent 

Expected  Frequency 

never  married 

31.3 

married 

56.1 

widowed 

2.5 

divorced/ separated 

10.1 

Table  11.18 


Suppose  that  a  random  sample  of  400  U.S.  young  adult  males,  18  -  24  years  old,  yielded  the 
following  frequency  distribution.  We  are  interested  in  whether  this  age  group  of  males  fits  the  dis- 
tribution of  the  U.S.  adult  population.  Calculate  the  frequency  one  would  expect  when  surveying 
400  people.  Fill  in  the  above  table,  rounding  to  two  decimal  places. 

^^This  content  is  available  online  at  <http:/ / cnx.org/content/ ml7028/1.20/>. 
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Marital  Status 

Frequency 

never  married 

140 

married 

238 

widowed 

2 

divorced/ separated 

20 

Table  11.19 


The  next  two  questions  refer  to  the  following  information.  The  columns  in  the  chart  below  contain  the 
Race/Ethnicity  of  U.S.  Public  Schools  for  a  recent  year,  the  percentages  for  the  Advanced  Placement  Exami- 
nee Population  for  that  class  and  the  Overall  Student  Population.  {Source:  http:// www.coUegeboard.com). 
Suppose  the  right  column  contains  the  result  of  a  survey  of  1000  local  students  from  that  year  who  took  an 
AP  Exam. 


Race/Ethnicity 

AP  Examinee  Popula- 
tion 

Overall  Student  Popu- 
lation 

Survey  Frequency 

Asian,  Asian  American 
or  Pacific  Islander 

10.2% 

5.4% 

113 

Black  or  African  Ameri- 

8.2% 

14.5% 

94 

can 

Hispanic  or  Latino 

15.5% 

15.9% 

136 

American    Indian  or 
Alaska  Native 

0.6% 

1.2% 

10 

White 

59.4% 

61.6% 

604 

Not  reported /other 

6.1% 

1.4% 

43 

Table  11.20 


Exercise  11.13.4 

Perform  a  goodness-of-fit  test  to  determine  whether  the  local  results  follow  the  distribution  of  the 
U.  S.  Overall  Student  Population  based  on  ethnicity. 

Exercise  11.13.5  (Solution  on  p.  518.) 

Perform  a  goodness-of-fit  test  to  determine  whether  the  local  resiilts  follow  the  distribution  of  U. 
S.  AP  Examinee  Population,  based  on  ethnicity. 

Exercise  11.13.6 

The  City  of  South  Lake  Tahoe,  CA,  has  an  Asian  population  of  1419  people,  out  of  a  total  popu- 
lation of  23,609  {Source:  U.S.  Census  Bureau).  Suppose  that  a  survey  of  1419  self-reported  Asians 
in  Manhattan,  NY,  area  yielded  the  data  in  the  table  below.  Conduct  a  goodness  of  fit  test  to  de- 
termine if  the  self-reported  sub-groups  of  Asians  in  the  Manhattan  area  fit  that  of  the  Lake  Tahoe 
area. 
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Race 

Lake  Tahoe  Frequency 

Manhattan  Frequency 

A      *           T      J  * 

Asian  Indian 

131 

174 

Chinese 

118 

557 

Filipino 

1045 

518 

Japanese 

80 

54 

Korean 

12 

29 

Vietnamese 

9 

21 

Other 

24 

66 

Table  11.21 


The  next  two  questions  refer  to  the  following  information:  UCLA  conducted  a  survey  of  more  than 
263,000  college  freshmen  from  385  colleges  in  fall  2005.  The  results  of  student  expected  majors  by  gender 
were  reported  in  The  Chronicle  of  Higher  Education  (2/2/2006).  Suppose  a  survey  of  5000  graduating 
females  and  5000  graduating  males  was  done  as  a  follow-up  last  year  to  determine  what  their  actual  major 
was.  The  results  are  shown  in  the  tables  for  Exercises  7  and  8.  The  second  column  in  each  table  does  not 
add  to  100%  because  of  rovinding. 

Exercise  11.13.7  (Solution  on  p.  519.) 

Conduct  a  hypothesis  test  to  determine  if  the  actual  college  major  of  graduating  females  fits  the 
distribution  of  their  expected  majors. 


Major 

Women  -  Expected  Major 

Women  -  Actual  Major 

Arts  &  Humanities 

14.0% 

670 

Biological  Sciences 

8.4% 

410 

Business 

13.1% 

685 

Education 

13.0% 

650 

Engineering 

2.6% 

145 

Physical  Sciences 

2.6% 

125 

Professional 

18.9% 

975 

Social  Sciences 

13.0% 

605 

Technical 

0.4% 

15 

Other 

5.8% 

300 

Undecided 

8.0% 

420 

Table  11.22 


Exercise  11.13.8 

Conduct  a  hypothesis  test  to  determine  if  the  actual  college  major  of  graduating  males  fits  the 
distribution  of  their  expected  majors. 
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Major 

Men  -  Expected  Major 

Men  -  Actual  Major 

Arts  &  Humanities 

11.0% 

600 

Biological  Sciences 

6.7% 

330 

Business 

22.7% 

1130 

Education 

5.8% 

305 

Engineering 

15.6% 

800 

Physical  Sciences 

3.6% 

175 

Professional 

9.3% 

460 

Social  Sciences 

7.6% 

370 

Technical 

1.8% 

90 

Other 

8.2% 

400 

Undecided 

6.6% 

340 

Table  11.23 


Exercise  11.13.9  (Solution  on  p.  519.) 

A  recent  debate  about  where  in  the  United  States  skiers  believe  the  skiing  is  best  prompted  the 
following  survey.  Test  to  see  if  the  best  ski  area  is  independent  of  the  level  of  the  skier. 


U.S.  Ski  Area 

Beginner 

Intermediate 

Advanced 

Tahoe 

20 

30 

40 

Utah 

10 

30 

60 

Colorado 

10 

40 

50 

Table  11.24 


Exercise  11.13.10 

Car  manufacturers  are  interested  in  whether  there  is  a  relationship  between  the  size  of  car  an 
individual  drives  and  the  number  of  people  in  the  driver's  family  (that  is,  whether  car  size  and 
family  size  are  independent).  To  test  this,  suppose  that  800  car  owners  were  randomly  surveyed 
with  the  following  results.  Conduct  a  test  for  independence. 


Family  Size 

Sub  &  Compact 

Mid-size 

Full-size 

Van  &  Truck 

1 

20 

35 

40 

35 

2 

20 

50 

70 

80 

3-4 

20 

50 

100 

90 

5+ 

20 

30 

70 

70 

Table  11.25 


Exercise  11.13.11  (Solution  on  p.  519.) 

College  students  may  be  interested  in  whether  or  not  their  majors  have  any  effect  on  starting 
salaries  after  graduation.  Suppose  that  300  recent  graduates  were  surveyed  as  to  their  majors 
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in  college  and  their  starting  salaries  after  graduation.  Below  are  the  data.  Conduct  a  test  for 
independence. 


Major 

<  $50,000 

$50,000  -  $68,999 

$69,000  + 

English 

5 

20 

5 

Engineering 

10 

30 

60 

Nursing 

10 

15 

15 

Business 

10 

20 

30 

Psychology 

20 

30 

20 

Table  11.26 


Exercise  11.13.12 

Some  travel  agents  claim  that  honeymoon  hot  spots  vary  according  to  age  of  the  bride  and  groom. 
Suppose  that  280  East  Coast  recent  brides  were  interviewed  as  to  where  they  spent  their  honey- 
moons. The  information  is  given  below.  Conduct  a  test  for  independence. 


Location 

20-29 

30-39 

40-49 

50  and  over 

Niagara  Falls 

15 

25 

25 

20 

Poconos 

15 

25 

25 

10 

Europe 

10 

25 

15 

5 

Virgin  Islands 

20 

25 

15 

5 

Table  11.27 


Exercise  11.13.13  (Solution  on  p.  519.) 

A  manager  of  a  sports  club  keeps  information  concerning  the  main  sport  in  which  members 
participate  and  their  ages.  To  test  whether  there  is  a  relationship  between  the  age  of  a  member 
and  his  or  her  choice  of  sport,  643  members  of  the  sports  club  are  randomly  selected.  Conduct  a 
test  for  independence. 


Sport 

18-25 

26-30 

31-40 

41  and  over 

racquetball 

42 

58 

30 

46 

tennis 

58 

76 

38 

65 

swimming 

72 

60 

65 

33 

Table  11.28 


Exercise  11.13.14 

A  major  food  manufacturer  is  concerned  that  the  sales  for  its  skinny  French  fries  have  been  de- 
creasing. As  a  part  of  a  feasibility  study,  the  company  conducts  research  into  the  types  of  fries  sold 
across  the  country  to  determine  if  the  type  of  fries  sold  is  independent  of  the  area  of  the  country. 
The  results  of  the  study  are  below.  Conduct  a  test  for  independence. 
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Type  of  Fries 

Northeast 

South 

Central 

West 

skinny  fries 

70 

50 

20 

25 

curly  fries 

100 

60 

15 

30 

steak  fries 

20 

40 

10 

10 

Table  11.29 


Exercise  11.13.15  (Solution  on  p.  519.) 

According  to  Dan  Lenard,  an  independent  insurance  agent  in  the  Buffalo,  N.Y.  area,  the  following 
is  a  breakdown  of  the  amount  of  life  insurance  purchased  by  males  in  the  following  age  groups. 
He  is  interested  in  whether  the  age  of  the  male  and  the  amount  of  life  insurance  purchased  are 
independent  events.  Conduct  a  test  for  independence. 


Age  of  Males 

None 

<  $200,000 

$200,000  -  $400,000 

$401,001  -  $1,000,000 

$1,000,000  + 

20-29 

40 

15 

40 

0 

5 

30-39 

35 

5 

20 

20 

10 

40-49 

20 

0 

30 

0 

30 

50  + 

40 

30 

15 

15 

10 

Table  11.30 


Exercise  11.13.16 

Suppose  that  600  thirty-year-olds  were  surveyed  to  determine  whether  or  not  there  is  a  relation- 
ship between  the  level  of  education  an  individual  has  and  salary.  Conduct  a  test  for  independence. 


Annual  Salary 

Not  a  high  school 
graduate 

High  school  grad- 
uate 

College  graduate 

Masters  or  doctor- 
ate 

<  $30,000 

15 

25 

10 

5 

$30,000  -  $40,000 

20 

40 

70 

30 

$40,000  -  850,000 

10 

20 

40 

55 

$50,000  -  $60,000 

5 

10 

20 

60 

$60,000  + 

0 

5 

10 

150 

Table  11.31 


Exercise  11.13.17  (Solution  on  p.  519.) 

A  Psychologist  is  interested  in  testing  whether  there  is  a  difference  in  the  distribution  of  personal- 
ity types  for  business  majors  and  social  science  majors.  The  results  of  the  study  are  shown  below. 
Conduct  a  Test  of  Homogeneity.  Test  at  a  5%  level  of  significance. 


Open 

Conscientious 

Extrovert 

Agreeable  Neurotic 

Business 

41 

52 

46 

61  58 

Social  Science 

72 

75 

63 

80  65 
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Table  11.32 


Exercise  11.13.18  (Solution  on  p.  519.) 

Do  men  and  women  select  different  breakfasts?  Hie  breakfast  ordered  by  randomly  selected  men 
and  women  at  a  popular  breakfast  place  is  shown  below.  Conduct  a  test  of  homogeneity.  Test  at  a 
5%  level  of  significance 


French  Toast 

Pancakes 

Waffles 

Omelettes 

Men 

47 

35 

28 

53 

Women 

65 

59 

55 

60 

Table  11.33 


Exercise  11.13.19  (Solution  on  p.  520.) 

Is  there  a  difference  between  the  distribution  of  community  college  statistics  students  and  the 
distribution  of  university  statistics  students  in  what  technology  they  use  on  their  homework?  Of 
the  randomly  selected  community  college  students  43  used  a  computer,  102  used  a  calculator 
with  built  in  statistics  functions,  and  65  used  a  table  from  the  textbook.  Of  the  randomly  selected 
university  students  28  used  a  computer,  33  used  a  calculator  with  built  in  statistics  functions,  and 
40  used  a  table  from  the  textbook.  Conduct  an  appropriate  hypothesis  test  using  a  0.05  level  of 
significance. 

Exercise  11.13.20  (Solution  on  p.  520.) 

A  fisherman  is  interested  in  whether  the  distribution  of  fish  caught  in  Green  Valley  Lake  is  the 
same  as  the  distribution  of  fish  caught  in  Echo  Lake.  Of  the  191  randomly  selected  fish  caught  in 
Green  Valley  Lake,  105  were  rainbow  trout,  27  were  other  trout,  35  were  bass,  and  24  were  catfish. 
Of  the  293  randomly  selected  fish  caught  in  Echo  Lake,  115  were  rainbow  trout,  58  were  other 
trout,  67  were  bass,  and  53  were  catfish.  Perform  the  h}^othesis  test  at  a  5%  level  of  significance. 

Exercise  11.13.21  (Solution  on  p.  520.) 

A  plant  manager  is  concerned  her  equipment  may  need  recalibrating.  It  seems  that  the  actual 
weight  of  the  15  oz.  cereal  boxes  it  fills  has  been  fluctuating.  The  standard  deviation  should  be 
at  most  2  oz.  In  order  to  determine  if  the  machine  needs  to  be  recalibrated,  84  randomly  selected 
boxes  of  cereal  from  the  next  day's  production  were  weighed.  The  standard  deviation  of  the  84 
boxes  was  0.54.  Does  the  machine  need  to  be  recalibrated? 

Exercise  11.13.22 

Consumers  may  be  interested  in  whether  the  cost  of  a  particular  calculator  varies  from  store  to 
store.  Based  on  surveying  43  stores,  which  yielded  a  sample  mean  of  $84  and  a  sample  standard 
deviation  of  $12,  test  the  claim  that  the  standard  deviation  is  greater  than  $15. 

Exercise  11.13.23  (Solution  on  p.  520.) 

Isabella,  an  accomplished  Bay  to  Breakers  runner,  claims  that  the  standard  deviation  for  her  time 
to  run  the  7  Vi  mile  race  is  at  most  3  minutes.  To  test  her  claim,  Rupinder  looks  up  5  of  her  race 
times.  They  are  55  minutes,  61  minutes,  58  minutes,  63  minutes,  and  57  minutes. 

Exercise  11.13.24 

Airline  companies  are  interested  in  the  consistency  of  the  number  of  babies  on  each  flight,  so  that 
they  have  adequate  safety  equipment.  They  are  also  interested  in  the  variation  of  the  number  of 
babies.  Suppose  that  an  airline  executive  believes  the  average  number  of  babies  on  flights  is  6  with 
a  variance  of  9  at  most.  The  airline  conducts  a  survey.  The  results  of  the  18  flights  surveyed  give 
a  sample  average  of  6.4  with  a  sample  standard  deviation  of  3.9.  Conduct  a  hj^othesis  test  of  the 
airline  executive's  belief. 
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Exercise  11.13.25  (Solution  on  p.  520.) 

The  number  of  births  per  woman  in  China  is  1.6  down  from  5.91  in  1966  (Source WbrW  Bank, 

6/5/12).  This  fertility  rate  has  been  attributed  to  the  law  passed  in  1979  restricting  births  to  one 
per  woman.  Suppose  that  a  group  of  students  studied  whether  or  not  the  standard  deviation  of 
births  per  woman  was  greater  than  0.75.  They  asked  50  women  across  China  the  number  of  births 
they  had.  Below  are  the  results.  Does  the  students'  siurvey  indicate  that  the  standard  deviation  is 
greater  than  0.75? 


#  of  births 

Frequency 

0 

5 

1 

30 

2 

10 

3 

5 

Table  11.34 


Exercise  11.13.26 

According  to  an  avid  aquariest,  the  average  number  of  fish  in  a  20-gallon  tank  is  10,  with  a 
standard  deviation  of  2.  His  friend,  also  an  aquariest,  does  not  believe  that  the  standard  deviation 
is  2.  She  counts  the  number  of  fish  in  15  other  20-gallon  tanks.  Based  on  the  results  that  follow,  do 
you  think  that  the  standard  deviation  is  different  from  2?  Data:  11;  10;  9;  10;  10;  11;  11;  10;  12;  9;  7; 
9;  11;  10;  11 

Exercise  11.13.27  (Solution  on  p.  520.) 

The  manager  of  "Frenchies"  is  concerned  that  patrons  are  not  consistently  receiving  the  same 
amount  of  French  fries  with  each  order.  The  chef  claims  that  the  standard  deviation  for  a  10- 
ounce  order  of  fries  is  at  most  1.5  oz.,  but  the  manager  thinks  that  it  may  be  higher.  He  randomly 
weighs  49  orders  of  fries,  which  yields  a  mean  of  11  oz.  and  a  standard  deviation  of  2  oz. 


11.13.2  Try  these  true/false  questions. 

Exercise  11.13.28  (Solution  on  p.  521.) 

As  the  degrees  of  freedom  increase,  the  graph  of  the  chi-square  distribution  looks  more  and  more 
symmetrical. 

Exercise  11.13.29  (Solution  on  p.  521.) 

The  standard  deviation  of  the  chi-square  distribution  is  twice  the  mean. 

Exercise  11.13.30  (Solution  on  p.  521.) 

The  mean  and  the  median  of  the  chi-square  distribution  are  the  same  if  df  —  24. 

Exercise  11.13.31  (Solution  on  p.  521.) 

In  a  Goodness-of-Fit  test,  the  expected  values  are  the  values  we  would  expect  if  the  nuU  hj^oth- 
esis  were  true. 

Exercise  11.13.32  (Solution  on  p.  521.) 

In  general,  if  the  observed  values  and  expected  values  of  a  Goodness-of-Fit  test  are  not  close 
together,  then  the  test  statistic  can  get  very  large  and  on  a  graph  wiU  be  way  out  in  the  right  tail. 

Exercise  11.13.33  (Solution  on  p.  521.) 

The  degrees  of  freedom  for  a  Test  for  Independence  are  equal  to  the  sample  size  minus  1. 
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Exercise  11.13.34  (Solution  on  p.  521.) 

Use  a  Goodness-of-Fit  test  to  determine  if  high  school  principals  believe  that  students  are  absent 
equally  during  the  week  or  not. 

Exercise  11.13.35  (Solution  on  p.  521.) 

The  Test  for  Independence  uses  tables  of  observed  and  expected  data  values. 

Exercise  11.13.36  (Solution  on  p.  521.) 

The  test  to  use  when  determining  if  the  college  or  university  a  student  chooses  to  attend  is  related 
to  his/her  socioeconomic  status  is  a  Test  for  Independence. 

Exercise  11.13.37  (Solution  on  p.  521.) 

The  test  to  use  to  determine  if  a  six-sided  die  is  fair  is  a  Goodness-of-Fit  test. 

Exercise  11.13.38  (Solution  on  p.  521.) 

In  a  Test  of  Independence,  the  expected  number  is  equal  to  the  row  total  multiplied  by  the  column 
total  divided  by  the  total  surveyed. 

Exercise  11.13.39  (Solution  on  p.  521.) 

In  a  Goodness-of  Fit  test,  if  the  p-value  is  0.0113,  in  general,  do  not  reject  the  null  hypothesis. 

Exercise  11.13.40  (Solution  on  p.  521.) 

For  a  Chi-Square  distribution  with  degrees  of  freedom  of  17,  the  probability  that  a  value  is  greater 

than  20  is  0.7258. 

Exercise  11.13.41  (Solution  on  p.  521.) 

If  df  =  2,  the  chi-square  distribution  has  a  shape  that  reminds  us  of  the  exponential. 
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11.14  Review ' 

The  next  hvo  questions  refer  to  the  following  real  study: 

A  recent  survey  of  U.S.  teenage  pregnancy  was  answered  by  720  girls,  age  12  -  19.  6%  of  the  girls  surveyed 
said  they  have  been  pregnant.  {Parade  Magazine)  We  are  interested  in  the  true  proportion  of  U.S.  girls,  age 
12  - 19,  who  have  been  pregnant. 

Exercise  11.14.1  (Solution  on  p.  521.) 

Find  the  95%  confidence  interval  for  the  true  proportion  of  U.S.  girls,  age  12  -  19,  who  have  been 

pregnant. 

Exercise  11.14.2  (Solution  on  p.  521.) 

The  report  also  stated  that  the  results  of  the  survey  are  accurate  to  within  ±  3.7%  at  the  95% 
confidence  level.  Suppose  that  a  new  study  is  to  be  done.  It  is  desired  to  be  accurate  to  within  2% 
of  the  95%  confidence  level.  What  is  the  minimum  number  that  should  be  surveyed? 

Exercise  11.14.3 

Given:  X  ~  Exp  (  j  j .  Sketch  the  graph  that  depicts:  P  {x  >  1). 
The  next  four  questions  refer  to  the  following  information: 

Suppose  that  the  time  that  owners  keep  their  cars  (purchased  new)  is  normally  distributed  with  a  mean 
of  7  years  and  a  standard  deviation  of  2  years.  We  are  interested  in  how  long  an  individual  keeps  his  car 
(purchased  new).  Our  population  is  people  who  buy  their  cars  new. 

Exercise  11.14.4  (Solution  on  p.  521.) 

60%  of  individuals  keep  their  cars  at  most  how  many  years? 

Exercise  11.14.5  (Solution  on  p.  521.) 

Suppose  that  we  randomly  survey  one  person.  Find  the  probability  that  person  keeps  his/her  car 
less  than  2.5  years. 

Exercise  11.14.6  (Solution  on  p.  521.) 

If  we  are  to  pick  individuals  10  at  a  time,  find  the  distribution  for  the  mean  car  length  ownership. 

Exercise  11.14.7  (Solution  on  p.  521.) 

If  we  are  to  pick  10  individuals,  find  the  probability  that  the  sum  of  their  ownership  time  is  more 

than  55  years. 

Exercise  11.14.8  (Solution  on  p.  521.) 

For  which  distribution  is  the  median  not  equal  to  the  mean? 

A.  Uniform 

B.  Exponential 

C.  Normal 

D.  Student-t 


Exercise  11.14.9  (Solution  on  p.  521.) 

Compare  the  standard  normal  distribution  to  the  student-t  distribution,  centered  at  0.  Explain 
which  of  the  following  are  true  and  which  are  false. 

a.  As  the  number  surveyed  increases,  the  area  to  the  left  of  -1  for  the  student-t  distribution  ap- 

proaches the  area  for  the  standard  normal  distribution. 

b.  As  the  degrees  of  freedom  decrease,  the  graph  of  the  student-t  distribution  looks  more  like  the 

graph  of  the  standard  normal  distribution. 

c.  If  the  number  surveyed  is  15,  the  normal  distribution  should  never  be  used. 

'This  content  is  available  online  at  <http://cnx.org/content/ml7057/l.ll/>. 
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The  next  five  questions  refer  to  the  following  information: 

We  are  interested  in  the  checking  account  balance  of  a  twenty-year-old  college  student.  We  randomly 
survey  16  twenty-year-old  college  students.  We  obtain  a  sample  mean  of  $640  and  a  sample  standard 
deviation  of  $150.  Let  X  =  checking  accoimt  balance  of  an  individual  twenty  year  old  college  student. 

Exercise  11.14.10 

Explain  why  we  cannot  determine  the  distribution  of  X. 

Exercise  11.14.11  (Solution  on  p.  522.) 

If  you  were  to  create  a  confidence  interval  or  perform  a  h5^othesis  test  for  the  population  mean 
checking  account  balance  of  20-year  old  college  students,  what  distribution  would  you  use? 

Exercise  11.14.12  (Solution  on  p.  522.) 

Find  the  95%  confidence  interval  for  the  true  mean  checking  account  balance  of  a  twenty-year-old 
college  student. 

Exercise  11.14.13  (Solution  on  p.  522.) 

What  t5^e  of  data  is  the  balance  of  the  checking  account  considered  to  be? 

Exercise  11.14.14  (Solution  on  p.  522.) 

What  type  of  data  is  the  number  of  20  year  olds  considered  to  be? 

Exercise  11.14.15  (Solution  on  p.  522.) 

On  average,  a  busy  emergency  room  gets  a  patient  with  a  shotgun  wound  about  once  per  week. 
We  are  interested  in  the  number  of  patients  with  a  shotgun  wound  the  emergency  room  gets  per 
28  days. 

a.  Define  the  random  variable  X. 

b.  State  the  distribution  for  X. 

c.  Find  the  probability  that  the  emergency  room  gets  no  patients  with  shotgun  wounds  in  the  next 

28  days. 

The  next  two  questions  refer  to  the  following  information: 

The  probability  that  a  certain  slot  machine  will  pay  back  money  when  a  quarter  is  inserted  is  0.30  .  Assume 
that  each  play  of  the  slot  machine  is  independent  from  each  other.  A  person  puts  in  15  quarters  for  15  plays. 

Exercise  11.14.16  (Solution  on  p.  522.) 

Is  the  expected  number  of  plays  of  the  slot  machine  that  will  pay  back  money  greater  than,  less 
than  or  the  same  as  the  median?  Explain  your  answer. 

Exercise  11.14.17  (Solution  on  p.  522.) 

Is  it  likely  that  exactly  8  of  the  15  plays  would  pay  back  money?  Justify  your  answer  numerically 

Exercise  11.14.18  (Solution  on  p.  522.) 

A  game  is  played  with  the  following  rules: 

•  it  costs  $10  to  enter 

•  a  fair  coin  is  tossed  4  times 

•  if  you  do  not  get  4  heads  or  4  tails,  you  lose  your  $10 

•  if  you  get  4  heads  or  4  tails,  you  get  back  youi  $10,  plus  $30  more 

Over  the  long  run  of  playing  this  game,  what  are  your  expected  earnings? 

Exercise  11.14.19  (Solution  on  p.  522.) 

•  The  mean  grade  on  a  math  exam  in  Rachel's  class  was  74,  with  a  standard  deviation  of  5. 
Rachel  earned  an  80. 
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•  The  mean  grade  on  a  math  exam  in  Becca's  class  was  47,  with  a  standard  deviation  of  2. 
Becca  earned  a  51. 

•  The  mean  grade  on  a  math  exam  in  Matt's  class  was  70,  with  a  standard  deviation  of  8.  Matt 
earned  an  83. 

Find  whose  score  was  the  best,  compared  to  his  or  her  own  class.  Justify  your  answer  numerically. 
The  next  two  questions  refer  to  the  following  information: 

A  random  sample  of  70  compulsive  gamblers  were  asked  the  number  of  days  they  go  to  casinos  per  week. 
The  results  are  given  in  the  following  graph: 

Relative  Frequency 


0.3 


0.1 


6  7 
Number  of  Days 


Figure  11.3 


Exercise  11.14.20  (Solution  on  p.  522.) 

Find  the  number  of  responses  that  were  "5". 

Exercise  11.14.21  (Solution  on  p.  522.) 

Find  the  mean,  standard  deviation,  the  median,  the  first  quartile,  the  third  quartile  and  the  IQR. 

Exercise  11.14.22  (Solution  on  p.  522.) 

Based  upon  research  at  De  Anza  College,  it  is  believed  that  about  19%  of  the  student  population 
speaks  a  language  other  than  English  at  home. 

Suppose  that  a  study  was  done  this  year  to  see  if  that  percent  has  decreased.  Ninety -eight  students 
were  randomly  surveyed  with  the  following  results.  Fourteen  said  that  they  speak  a  language 
other  than  English  at  home. 

a.  State  an  appropriate  null  hypothesis. 

b.  State  an  appropriate  alternate  hypothesis. 

c.  Define  the  Random  Variable,  P'. 

d.  Calculate  the  test  statistic. 

e.  Calculate  the  p-value. 
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f .  At  the  5%  level  of  decision,  what  is  your  decision  about  the  null  hypothesis? 

g.  What  is  the  Type  I  error? 

h.  What  is  the  Type  II  error? 


Exercise  11.14.23 


(Solution  on  p.  522.) 


Assume  that  you  are  an  emergency  paramedic  called  in  to  rescue  victims  of  an  accident.  You 
need  to  help  a  patient  who  is  bleeding  profusely.  The  patient  is  also  considered  to  be  a  high  risk 
for  contracting  AIDS.  Assume  that  the  nuU  hypothesis  is  that  the  patient  does  not  have  the  HIV 
virus.  What  is  a  Type  I  error? 


It  is  often  said  that  Californians  are  more  casual  than  the  rest  of  Americans.  Suppose  that  a 
survey  was  done  to  see  if  the  proportion  of  Californian  professionals  that  wear  jeans  to  work  is 
greater  than  the  proportion  of  non-Calif ornian  professionals.  Fifty  of  each  was  surveyed  with  the 
following  resiilts.  15  Californians  wear  jeans  to  work  and  6  non-CaUfomians  wear  jeans  to  work. 

•  C  =  Califomian  professional 

•  NC  =  non-Califomian  professional 

a.  State  appropriate  null  and  alternate  hypotheses. 

b.  Define  the  Random  Variable. 

c.  Calculate  the  test  statistic  and  p-value. 

d.  At  the  5%  significance  level,  what  is  youi  decision? 

e.  What  is  the  Type  I  error? 

f .  What  is  the  Type  II  error? 

The  next  two  questions  refer  to  the  following  information: 


A  group  of  Statistics  students  have  developed  a  technique  that  they  feel  will  lower  their  anxiety  level  on 
statistics  exams.  They  measured  their  anxiety  level  at  the  start  of  the  quarter  and  again  at  the  end  of  the 
quarter.  Recorded  is  the  paired  data  in  that  order:  (1000,  900);  (1200,  1050);  (600,  700);  (1300,  1100);  (1000, 

900);  (900,  900). 


Exercise  11.14.24 


(Solution  on  p.  522.) 


Exercise  11.14.25 

This  is  a  test  of  (pick  the  best  answer): 


(Solution  on  p.  522.) 


A.  large  samples,  independent  means 

B.  small  samples,  independent  means 

C.  dependent  means 


Exercise  11.14.26 

State  the  distribution  to  use  for  the  test. 


(Solution  on  p.  522.) 
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11.15  Lab  1:  Chi-Square  Goodness-of-Fit^^ 

Class  Time: 
Names: 

11.15.1  Student  Learning  Outcome: 

•  The  student  will  evaluate  data  collected  to  determine  if  they  fit  either  the  imiform  or  exponential 
distributions. 

11.15.2  Collect  the  Data 

NOTE:  You  may  need  to  combine  two  categories  so  that  each  cell  has  an  expected  value  of  at  least 
5. 

Go  to  your  local  supermarket.  Ask  30  people  as  they  leave  for  the  total  amount  on  their  grocery  receipts. 
(Or,  ask  3  cashiers  for  the  last  10  amounts.  Be  sure  to  include  the  express  lane,  if  it  is  open.) 

1.  Record  the  values. 


Table  11.35 

2.  Construct  a  histogram  of  the  data.  Make  5-6  intervals.  Sketch  the  graph  using  a  ruler  and  pencU. 
Scale  the  axes. 

^^This  content  is  available  online  at  <http://CTix.org/content/ml7049/1.9/>. 
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Relative  Frequency 


Amount  of  Receipt 


Figure  11.4 


3.  Calculate  the  following: 


a.  X  - 

b.  s  = 

2 


C 


11.15.3  Uniform  Distribution 

Test  to  see  if  grocery  receipts  follow  the  uniform  distribution. 

1 .  Using  your  lowest  and  highest  values,  X  ^  U  (  ,  ) 

2.  Divide  the  distribution  above  into  fifths. 

3.  Calculate  the  following: 

a.  Lowest  value  = 

b.  20th  percentile  = 

c.  40th  percentile  = 

d.  60th  percentile  = 

e.  80th  percentile  = 

f .  Highest  value  = 

4.  For  each  fifth,  count  the  observed  number  of  receipts  and  record  it.  Then  determine  the  expected 
number  of  receipts  and  record  that. 
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Fifth 

Observed 

Expected 

1st 

2nd 

3rd 

4th 

5th 

Table  11.36 


5.  Ho-. 

6.  Ha-. 

7.  What  distribution  should  you  use  for  a  hypothesis  test? 

8.  Why  did  you  choose  this  distribution? 

9.  Calculate  the  test  statistic. 

10.  Find  the  p-value. 

11.  Sketch  a  graph  of  the  situation.  Label  and  scale  the  x-axis.  Shade  the  area  corresponding  to  the  p- 
value. 


Figure  11.5 
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12.  State  your  decision. 

13.  State  yoiir  conclusion  in  a  complete  sentence. 


11.15.4  Exponential  Distribution 

Test  to  see  if  grocery  receipts  follow  the  exponential  distribution  with  decay  parameter  1  . 

1.  Using  i  as  the  decay  parameter,  X  ~  Exp  (  ). 

2.  Calculate  the  following: 

a.  Lowest  value 

b.  First  quartile  = 

c.  37th  percentile  = 

d.  Median  = 

e.  63rd  percentile  = 

f .  3rd  quartile  = 

g.  Highest  value  = 

3.  For  each  cell,  count  the  observed  number  of  receipts  and  record  it.  Then  determine  the  expected 
number  of  receipts  and  record  that. 


Cell 

Observed 

Expected 

1st 

2nd 

3rd 

4th 

5th 

6th 

Table  11.37 


4.  Ho 

5.  H, 

6.  What  distribution  should  you  use  for  a  hypothesis  test? 

7.  Why  did  you  choose  this  distribution? 

8.  Calculate  the  test  statistic. 

9.  Find  the  p-value. 

10.  Sketch  a  graph  of  the  situation.  Label  and  scale  the  x-axis.  Shade  the  area  corresponding  to  the  p- 
value. 
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Figure  11.6 


11.  State  your  decision. 

12.  State  your  conclusion  in  a  complete  sentence. 


11.15.5  Discussion  Questions 

1 .  Did  your  data  fit  either  distribution?  If  so,  which? 

2.  In  general,  do  you  think  it's  likely  that  data  could  fit  more  than  one  distribution?  In  complete  sen- 
tences, explain  why  or  why  not. 
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11.16  Lab  2:  Chi-Square  Test  for  Independence^'^ 

Class  Time: 
Names: 

11.16.1  Student  Learning  Outcome: 

•  The  student  wiU  evaluate  if  there  is  a  significant  relationship  between  favorite  type  of  snack  and 
gender. 

11.16.2  Collect  the  Data 

1.  Using  your  class  as  a  sample,  complete  the  following  chart. 

NOTE:  You  may  need  to  combine  two  food  categories  so  that  each  cell  has  an  expected  value 
of  at  least  5 

Favorite  type  of  snack 


sweets  (candy  &  baked  goods) 

ice  cream 

chips  &  pretzels 

fruits  &  vegetables 

Total 

male 

female 

Total 

Table  11.38 


2.  Looking  at  the  above  chart,  does  it  appear  to  you  that  there  is  dependence  between  gender  and  fa- 
vorite type  of  snack  food?  Why  or  why  not? 

11.16.3  Hypothesis  Test 

Conduct  a  hypothesis  test  to  determine  if  the  factors  are  independent 

1.  H„: 

2.  Ha-. 

3.  What  distribution  should  you  use  for  a  hypothesis  test? 

4.  Why  did  you  choose  this  distribution? 

5.  Calculate  the  test  statistic. 

6.  Find  the  p-value. 

7.  Sketch  a  graph  of  the  situation.  Label  and  scale  the  x-axis.  Shade  the  area  corresponding  to  the  p- 
value. 

^*This  content  is  available  online  at  <http://cnx.org/content/ml7050/l.ll/>. 
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8.  State  your  decision. 

9.  State  your  conclusion  in  a  complete  sentence. 

11.16.4  Discussion  Questions 

1.  Is  the  conclusion  of  your  study  the  same  as  or  different  from  your  answer  to  (12)  above? 

2.  Why  do  you  think  that  occurred? 
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Solutions  to  Exercises  in  Chapter  11 

Solutions  to  Practice  1:  Goodness-of-Fit  Test 

Solution  to  Exercise  11.10.4  (p.  493) 

degrees  of  freedom  =  3 

Solution  to  Exercise  11.10.5  (p.  493) 

2016.14 

Solution  to  Exercise  11.10.6  (p.  493) 

Rounded  to  4  decimal  places,  the  p-value  is  0.0000. 

Solutions  to  Practice  2:  Contingency  Tables 

Solution  to  Exercise  11.11.1  (p.  494) 

12 

Solution  to  Exercise  11.11.2  (p.  494) 

10301.8 

Solution  to  Exercise  11.11.3  (p.  494) 

0 

Solution  to  Exercise  11.11.4  (p.  494) 

right 

Solution  to  Exercise  11.11.6  (p.  495) 

a.  Reject  the  null  h5^othesis 

Solutions  to  Practice  3:  Test  of  a  Single  Variance 

Solution  to  Exercise  11.12.2  (p.  496) 

225 

Solution  to  Exercise  11.12.6  (p.  496) 
24 

Solution  to  Exercise  11.12.7  (p.  496) 

36 

Solution  to  Exercise  11.12.8  (p.  496) 

0.0549 

Solutions  to  Homework 

Solution  to  Exercise  11.13.3  (p.  498) 

a.  The  data  fits  the  distribution 

b.  The  data  does  not  fit  the  distribution 

c.  3 

e.  19.27 

f.  0.0002 

h.  Decision:  Reject  Null;  Conclusion:  Data  does  not  fit  the  distribution. 
Solution  to  Exercise  11.13.5  (p.  499) 

c.  5 

e.  13.4 

f.  0.0199 
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g.  Decision:  Reject  null  when  a  —  0.05;  Conclusion:  Local  data  do  not  fit  the  AP  Examinee  Distribution. 

Decision:  Do  not  reject  null  when  a  —  0.01;  Conclusion:  There  is  insufficient  evidence  to  conclude 
that  Local  data  do  not  fit  the  AP  Examinee  Distribution. 

Solution  to  Exercise  11.13.7  (p.  500) 

c.  10 

e.  11.48 

f.  0.3214 

h.  Decision:  Do  not  reject  null  when  a  —  0.05  and  a  —  0.01;  Conclusion:  There  is  insufficient  evidence  to 

conclude  that  the  distribution  of  majors  by  graduating  females  does  not  fit  the  distribution  of  expected 

majors. 

Solution  to  Exercise  11.13.9  (p.  501) 
c.  4 

e.  10.53 

f.  0.0324 

h.  Decision:  Reject  nuU;  Conclusion:  Best  ski  area  and  level  of  skier  are  not  independent. 
Solution  to  Exercise  11.13.11  (p.  501) 
c.  8 

e.  33.55 

f.  0 

h.  Decision:  Reject  nuU;  Conclusion:  Major  and  starting  salary  are  not  independent  events. 
Solution  to  Exercise  11.13.13  (p.  502) 
c.  6 

e.  25.21 

f.  0.0003 

h.  Decision:  Reject  null 

Solution  to  Exercise  11.13.15  (p.  503) 

c.  12 

e.  125.74 

f.  0 

h.  Decision:  Reject  null 

Solution  to  Exercise  11.13.17  (p.  503) 

c:  4 

d:  Chi-Square  with  df  =  4 
e:  3.01 

f:  p-value  =  0.5568 

h:  ii.  Do  not  reject  the  niill  h5rpothesis. 

iv.  There  is  insufficient  evidence  to  conclude  that  the  distribution  of  personality  types  is  different  for 
business  and  social  science  majors. 

Solution  to  Exercise  11.13.18  (p.  504) 

c:  3 
e:  4.01 

f :  p-value  =  0.2601 
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h:  ii.  Do  not  reject  the  null  hypothesis. 

iv.  There  is  insufficient  evidence  to  conclude  that  the  distribution  of  breakfast  ordered  is  different  for 
men  and  women. 

Solution  to  Exercise  11.13.19  (p.  504) 

c:  2 

e:  7.05 

f:  p-value  =  0.0294 

h:  ii.  Reject  the  niill  h}^othesis. 

iv.  There  is  sufficient  evidence  to  conclude  that  the  distribution  of  technology  use  for  statistics  home- 
work is  not  the  same  for  statistics  students  at  community  colleges  and  at  universities. 

Solution  to  Exercise  11.13.20  (p.  504) 

c:  3 

d:  Chi-Square  with  df  =  3 
e:  11.75 

f:  p-value  =  0.0083 

h:  ii.  Reject  the  null  hypothesis. 

iv.  There  is  sufficient  evidence  to  conclude  that  the  distribution  of  fish  in  Green  Valley  Lake  is  not  the 

same  as  the  distribution  of  fish  in  Echo  Lake. 

Solution  to  Exercise  11.13.21  (p.  504) 

c.  83 

d.  Chi-Square  with  df  =  83 

e.  96.81 

f.  p-value  =  0.1426;  There  is  a  0.1426  probability  that  the  sample  standard  deviation  is  0.54  or  more. 

h.  Decision:  Do  not  reject  null;  Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  standard 
deviation  is  more  than  0.5  oz.  It  cannot  be  determined  whether  the  equipment  needs  to  be  recalibrated 
or  not. 

Solution  to  Exercise  11.13.23  (p.  504) 

c.  4 

d.  Chi-Square  with  df  =  4 

e.  4.52 

f.  0.3402 

h.  Decision:  Do  not  reject  null. 
Solution  to  Exercise  11.13.25  (p.  505) 

c.  49 

d.  Chi-Square  with  df  =  49 

e.  54.37 

f.  p-value  =  0.2774;  If  the  null  hypothesis  is  true,  there  is  a  0.2774  probability  that  the  sample  standard 

deviation  is  0.79  or  more. 

h.  Decision:  Do  not  reject  null;  Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  standard 
deviation  is  more  than  0.75.  It  cannot  be  determined  if  the  standard  deviation  is  greater  than  0.75  or 
not. 

Solution  to  Exercise  11.13.27  (p.  505) 

a.  cr2  <  (1.5)2 
c.  48 
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d.  Chi-Square  with  df  =  48 

e.  85.33 

f.  0.0007 

h.  Decision:  Reject  null. 

Solution  to  Exercise  11.13.28  (p.  505) 

True 

Solution  to  Exercise  11.13.29  (p.  505) 

False 

Solution  to  Exercise  11.13.30  (p.  505) 

False 

Solution  to  Exercise  11.13.31  (p.  505) 

True 

Solution  to  Exercise  11.13.32  (p.  505) 

True 

Solution  to  Exercise  11.13.33  (p.  505) 

False 

Solution  to  Exercise  11.13.34  (p.  506) 

True 

Solution  to  Exercise  11.13.35  (p.  506) 

True 

Solution  to  Exercise  11.13.36  (p.  506) 

True 

Solution  to  Exercise  11.13.37  (p.  506) 

True 

Solution  to  Exercise  11.13.38  (p.  506) 

True 

Solution  to  Exercise  11.13.39  (p.  506) 

False 

Solution  to  Exercise  11.13.40  (p.  506) 

False 

Solution  to  Exercise  11.13.41  (p.  506) 

True 

Solutions  to  Review 

Solution  to  Exercise  11.14.1  (p.  507) 

(0.0424,0.0770) 

Solution  to  Exercise  11.14.2  (p.  507) 
2401 

Solution  to  Exercise  11.14.4  (p.  507) 
7.5 

Solution  to  Exercise  11.14.5  (p.  507) 

0.0122 

Solution  to  Exercise  11.14.6  (p.  507) 

N  (7, 0.63) 

Solution  to  Exercise  11.14.7  (p.  507) 

0.9911 

Solution  to  Exercise  11.14.8  (p.  507) 

B 

Solution  to  Exercise  11.14.9  (p.  507) 
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a.  True 

b.  False 

c.  False 

Solution  to  Exercise  11.14.11  (p.  508) 

student-t  with  df  —  15 

Solution  to  Exercise  11.14.12  (p.  508) 

(560.07,719.93) 

Solution  to  Exercise  11.14.13  (p.  508) 

quantitative  -  continuous 

Solution  to  Exercise  11.14.14  (p.  508) 

quantitative  -  discrete 

Solution  to  Exercise  11.14.15  (p.  508) 

b.  P  (4) 

c.  0.0183 

Solution  to  Exercise  11.14.16  (p.  508) 

greater  than 

Solution  to  Exercise  11.14.17  (p.  508) 

No;  P  (x  =  8)  =  0.0348 

Solution  to  Exercise  11.14.18  (p.  508) 

You  will  lose  $5 

Solution  to  Exercise  11.14.19  (p.  508) 

Becca 

Solution  to  Exercise  11.14.20  (p.  509) 

14 

Solution  to  Exercise  11.14.21  (p.  509) 

• .  Sample  mean  =  3.2 

•  .  Sample  standard  deviation  =  1.85 

• .  Median  =  3 

•.  Quartilel  =  2 

•.  Quartile3  =  5 

•.  IQR  =  3 

Solution  to  Exercise  11.14.22  (p.  509) 

d.  z  =  -1.19 

e.  0.1171 

f.  Do  not  reject  the  null 

Solution  to  Exercise  11.14.23  (p.  510) 

We  conclude  that  the  patient  does  have  the  HIV  virus  when,  in  fact,  the  patient  does  not. 
Solution  to  Exercise  11.14.24  (p.  510) 

c.  z  =  2.21 ;  p  =  0.0136 

d.  Reject  the  null 

e.  We  conclude  that  the  proportion  of  Califomian  professionals  that  wear  jeans  to  work  is  greater  than  the 

proportion  of  non-Californian  professionals  when,  in  fact,  it  is  not  greater. 

f.  We  cannot  conclude  that  the  proportion  of  Californian  professionals  that  wear  jeans  to  work  is  greater 

than  the  proportion  of  non-Califomian  professionals  when,  in  fact,  it  is  greater. 

Solution  to  Exercise  11.14.25  (p.  510) 

C 

Solution  to  Exercise  11.14.26  (p.  510) 
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Chapter  12 

Linear  Regression  and  Correlation 


12.1  Linear  Regression  and  Correlation^ 

12.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Discuss  basic  ideas  of  linear  regression  and  correlation. 

•  Create  and  interpret  a  line  of  best  fit. 

•  Calculate  and  interpret  the  correlation  coefficient. 

•  Calculate  and  interpret  outliers. 


12.1.2  Introduction 

Professionals  often  want  to  know  how  two  or  more  numeric  variables  are  related.  For  example,  is  there  a 
relationship  between  the  grade  on  the  second  math  exam  a  student  takes  and  the  grade  on  the  final  exam? 
If  there  is  a  relationship,  what  is  it  and  how  strong  is  the  relationship? 

In  another  example,  your  income  may  be  determined  by  your  education,  your  profession,  your  years  of 
experience,  and  your  ability.  The  amount  you  pay  a  repair  person  for  labor  is  often  determined  by  an  initial 
amount  plus  an  hourly  fee.  These  are  all  examples  in  which  regression  can  be  used. 

The  t5^e  of  data  described  in  the  examples  is  bivariate  data  -  "bi"  for  two  variables.  In  reality,  statisticians 
use  multivariate  data,  meaning  many  variables. 

In  this  chapter,  you  will  be  studying  the  simplest  form  of  regression,  "linear  regression"  with  one  indepen- 
dent variable  (x).  This  involves  data  that  fits  a  line  in  two  dimensions.  You  wiU  also  study  correlation  which 
measures  how  strong  the  relationship  is. 

12.2  Linear  Equations^ 

Linear  regression  for  two  variables  is  based  on  a  linear  equation  with  one  independent  variable.  It  has  the 
form: 

y  =  fl  +  bx  (12.1) 

^This  content  is  available  online  at  <http://cnx.Org/content/ml7089/l.6/>. 
^This  content  is  available  online  at  <http://cnx.org/content/ml7086/1.4/>. 
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where  a  and  b  are  constant  numbers. 

X  is  the  independent  variable,  and  y  is  the  dependent  variable.  Typically,  you  choose  a  value  to  substitute 
for  the  independent  variable  and  then  solve  for  the  dependent  variable. 

Example  12.1 

The  following  examples  are  linear  equations. 

y  =  3  +  2x  (12.2) 

y  =  -0.01  +  1.2x  (12.3) 

The  graph  of  a  linear  equation  of  the  form  y  =  a  +  bx  is  a  straight  line.  Any  line  that  is  not  vertical  can  be 
described  by  this  equation. 

Example  12.2 


y 


Figure  12.1:  Graph  of  the  equation  y  =  —  1  +  2x. 


Linear  equations  of  this  form  occur  in  applications  of  life  sciences,  social  sciences,  psychology,  business, 
economics,  physical  sciences,  mathematics,  and  other  areas. 

Example  12.3 

Aaron's  Word  Processing  Service  (AWPS)  does  word  processing.  Its  rate  is  $32  per  hour  plus  a 
$31.50  one-time  charge.  The  total  cost  to  a  customer  depends  on  the  number  of  hours  it  takes  to 
do  the  word  processing  job. 

Problem 

Find  the  equation  that  expresses  the  total  cost  in  terms  of  the  number  of  hours  required  to  finish 
the  word  processing  job. 

Solution 

Let  X  =  the  number  of  hours  it  takes  to  get  the  job  done. 
Let  y  =  the  total  cost  to  the  customer. 

The  $31.50  is  a  fixed  cost.  If  it  takes  x  hours  to  complete  the  job,  then  (32)  (x)  is  the  cost  of  the 
word  processing  only.  The  total  cost  is: 
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y  =  31.50 +  32x 


12.3  Slope  and  Y-Intercept  of  a  Linear  Equation^ 

For  the  linear  equation  y  =  a  +  bx,  b  =  slope  and  a  =  y-intercept. 

From  algebra  recall  that  the  slope  is  a  number  that  describes  the  steepness  of  a  line  and  the  y-intercept  is 
the  y  coordinate  of  the  point  (0,  a)  where  the  line  crosses  the  y-axis. 


(a)  (b)  (c) 

Figure  12.2:  Three  possible  graphs  of  y  =  a  +  bx.  (a)  If  >  0,  the  line  slopes  upward  to  the  right,  (b)  If 
b  =  0,  the  line  is  horizontal,  (c)  If  b  <  0,  the  line  slopes  downward  to  the  right. 


Example  12.4 

Svetlana  tutors  to  make  extra  money  for  college.  For  each  tutoring  session,  she  charges  a  one 
time  fee  of  $25  plus  $15  per  hour  of  tutoring.  A  linear  equation  that  expresses  the  total  amount  of 
money  Svetlana  earns  for  each  session  she  tutors  is  y  =  25  +  15x. 
Problem 

What  are  the  independent  and  dependent  variables?  What  is  the  y-intercept  and  what  is  the 
slope?  Interpret  them  using  complete  sentences. 

Solution 

The  independent  variable  (x)  is  the  number  of  hours  Svetlana  tutors  each  session.  The  dependent 
variable  (y)  is  the  amount,  in  dollars,  Svetlana  earns  for  each  session. 

The  y-intercept  is  25  (a  =  25).  At  the  start  of  the  tutoring  session,  Svetlana  charges  a  one-time  fee 
of  $25  (this  is  when  x  =  0).  The  slope  is  15  (b  =  15).  For  each  session,  Svetlana  earns  $15  for  each 
hour  she  tutors. 


■^This  content  is  available  online  at  <http://cnx.Org/content/ml7083/l.5/>. 
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12.4  Scatter  Plots' 

Before  we  take  up  the  discussion  of  linear  regression  and  correlation,  we  need  to  examine  a  way  to  display 
the  relation  between  two  variables  x  and  y.  The  most  common  and  easiest  way  is  a  scatter  plot.  The 
following  example  illustrates  a  scatter  plot. 

Example  12.5 

From  an  article  in  the  Wall  Street  Journal:  In  Europe  and  Asia,  m-commerce  is  popular.  M- 
commerce  users  have  special  mobile  phones  that  work  like  electronic  wallets  as  well  as  provide 
phone  and  Internet  services.  Users  can  do  everything  from  paying  for  parking  to  buying  a  TV  set 
or  soda  from  a  machine  to  banking  to  checking  sports  scores  on  the  Internet.  For  the  years  2000 
through  2004,  was  there  a  relationship  between  the  year  and  the  number  of  m-commerce  users? 
Construct  a  scatter  plot.  Let  x  =  the  year  and  let  y  =  the  number  of  m-commerce  users,  in  millions. 


X  (year) 

y  (#  of  users) 

2000 

0.5 

2002 

20.0 

2003 

33.0 

2004 

47.0 

(a) 


Figure  12.3:  (a)  Table  showing  the  number  of  m-commerce  users  (in  millions)  by  year,  (b)  Scatter  plot 
showing  the  number  of  m-commerce  users  (in  millions)  by  year. 


A  scatter  plot  shows  the  direction  and  strength  of  a  relationship  between  the  variables.  A  clear  direction 
happens  when  there  is  either: 

•  High  values  of  one  variable  occurring  with  high  values  of  the  other  variable  or  low  values  of  one 
variable  occurring  with  low  values  of  the  other  variable. 

•  High  values  of  one  variable  occurring  with  low  values  of  the  other  variable. 

You  can  determine  the  strength  of  the  relationship  by  looking  at  the  scatter  plot  and  seeing  how  close  the 
points  are  to  a  line,  a  power  function,  an  exponential  function,  or  to  some  other  type  of  function. 

When  you  look  at  a  scatterplot,  you  want  to  notice  the  overall  pattern  and  any  deviations  from  the  pattern. 
The  following  scatterplot  examples  illustrate  these  concepts. 

*This  content  is  available  online  at  <http://cnx.org/content/ml7082/1.8/>. 
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(a)  Positive  Linear  Pattern  (Strong)  (b)  Linear  Pattern  w/  One  Deviation 

Figure  12.4 


(a)  Negative  Linear  Pattern  (Strong)  (b)  Negative  Linear  Pattern  (Weak) 

Figure  12.5 


(a)  Exponential  Growth  Pattern 


(b)  No  Pattern 


Figure  12.6 


In  this  chapter,  we  are  interested  in  scatter  plots  that  show  a  linear  pattern.  Linear  patterns  are  quite  com- 
mon. The  linear  relationship  is  strong  if  the  points  are  close  to  a  straight  line.  If  we  think  that  the  points 
show  a  linear  relationship,  we  would  like  to  draw  a  line  on  the  scatter  plot.  This  line  can  be  calculated 
through  a  process  called  linear  regression.  However,  we  only  calculate  a  regression  line  if  one  of  the  vari- 
ables helps  to  explain  or  predict  the  other  variable.  If  x  is  the  independent  variable  and  y  the  dependent 
variable,  then  we  can  use  a  regression  line  to  predict  y  for  a  given  value  of  x. 
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12.5  The  Regression  Equation^ 

Data  rarely  fit  a  straight  line  exactly.  Usually,  you  must  be  satisfied  with  rough  predictions.  Typically,  you 
have  a  set  of  data  whose  scatter  plot  appears  to  "fit"  a  straight  line.  This  is  called  a  Line  of  Best  Fit  or  Least 
Squares  Line. 

12.5.1  Optional  Collaborative  Classroom  Activity 

If  you  know  a  person's  pinky  (smallest)  finger  length,  do  you  think  you  could  predict  that  person's  height? 
Collect  data  from  your  class  (pinky  finger  length,  in  inches).  The  independent  variable,  x,  is  pinky  finger 
length  and  the  dependent  variable,  y,  is  height. 

For  each  set  of  data,  plot  the  points  on  graph  paper.  Make  your  graph  big  enough  and  use  a  ruler.  Then 
"by  eye"  draw  a  line  that  appears  to  "fit"  the  data.  For  your  line,  pick  two  convenient  points  and  use  them 
to  find  the  slope  of  the  line.  Find  the  y-intercept  of  the  line  by  extending  your  lines  so  they  cross  the  y-axis. 
Using  the  slopes  and  the  y-intercepts,  write  your  equation  of  "best  fit".  Do  you  think  everyone  will  have 
the  same  equation?  Why  or  why  not? 

Using  your  equation,  what  is  the  predicted  height  for  a  pinky  length  of  2.5  inches? 
Example  12.6 

A  random  sample  of  11  statistics  students  produced  the  following  data  where  x  is  the  third  exam 
score,  out  of  80,  and  y  is  the  final  exam  score,  out  of  200.  Can  you  predict  the  final  exam  score  of  a 

random  student  if  you  know  the  third  exam  score? 

^This  content  is  available  online  at  <http://catx.org/content/ml7090/1.15/>. 
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X  (third  exam  score) 

y  (final  exam  score) 

65 

175 

67 

133 

71 

185 

71 

163 

66 

126 

75 

198 

67 

153 

70 

163 

71 

159 

69 

151 

69 

159 

(a) 

(b) 


Figure  12.7:  (a)  Table  showing  the  scores  on  the  final  exam  based  on  scores  from  the  third  exam,  (b)  Scatter 
plot  showing  the  scores  on  the  final  exam  based  on  scores  from  the  third  exam. 


The  third  exam  score,  x,  is  the  independent  variable  and  the  final  exam  score,  y,  is  the  dependent  variable. 
We  will  plot  a  regression  line  that  best  "fits"  the  data.  If  each  of  you  were  to  fit  a  line  "by  eye",  you  would 
draw  different  lines.  We  can  use  what  is  called  a  least-squares  regression  line  to  obtain  the  best  fit  line. 

Consider  the  following  diagram.  Each  point  of  data  is  of  the  the  form  (x,  i/)and  each  point  of  the  line  of 

best  fit  using  least-squares  linear  regression  has  the  form    x,  y  . 

A 

The  y  is  read  "y  hat"  and  is  the  estimated  value  of  y.  It  is  the  value  of  y  obtained  using  the  regression  line. 
It  is  not  generally  equal  to  y  from  data. 
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A 

The  term  yo  ~  Vo  =  ^0  is  called  the  "error"  or  residual.  It  is  not  an  error  in  the  sense  of  a  mistake.  The 
absolute  value  of  a  residual  measures  the  vertical  distance  between  the  actual  value  of  y  and  the  estimated 
value  of  y.  In  other  words,  it  measures  the  vertical  distance  between  the  actual  data  point  and  the  predicted 
point  on  the  line. 

If  the  observed  data  point  lies  above  the  line,  the  residual  is  positive,  and  the  line  underestimates  the 
actual  data  value  for  y.  If  the  observed  data  point  lies  below  the  line,  the  residual  is  negative,  and  the  line 
overestimates  that  actual  data  value  for  y. 

A 

In  the  diagram  above,  yo  —  Vo  =  eo  is  the  residual  for  the  point  shown.  Here  the  point  lies  above  the  line 
and  the  residual  is  positive. 

e  =  the  Greek  letter  epsilon 

A 

For  each  data  point,  you  can  calculate  the  residuals  or  errors,  y,  —  y,  =     for  i  =  1,  2,  3, 11. 
Each  I  e  I  is  a  vertical  distance. 

For  the  example  about  the  third  exam  scores  and  the  final  exam  scores  for  the  11  statistics  students,  there 
are  11  data  points.  Therefore,  there  are  He  values.  If  you  square  each  e  and  add,  you  get 

{eif+{e2f  +  ...  +  {eirf  =  \ 

1  =  1 

This  is  called  the  Sum  of  Squared  Errors  (SSE). 

Using  calculus,  you  can  determine  the  values  of  a  and  h  that  make  the  SSE  a  minimum.  When  you  make 
the  SSE  a  minimum,  you  have  determined  the  points  that  are  on  the  line  of  best  fit.  It  turns  out  that  the  line 
of  best  fit  has  the  equation: 

A 

y=fl  +  bx  (12.4) 
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where  a  =  y  —  b  ■  x  and  b  =        "^'^^  . 

X  and  y  are  the  sample  means  of  the  x  values  and  the  y  values,  respectively.  The  best  fit  line  always  passes 
through  the  point  (x,y). 

The  slope  b  can  be  written  as  b  =  r  ■  where  Sy  =  the  standard  deviation  of  the  y  values  and  Sx  =  the 
standard  deviation  of  the  x  values,  r  is  the  correlation  coefficient  which  is  discussed  in  the  next  section. 

Least  Squares  Criteria  for  Best  Fit 

The  process  of  fitting  the  best  fit  line  is  called  linear  regression.  The  idea  behind  finding  the  best  fit  line  is 
based  on  the  assumption  that  the  data  are  scattered  about  a  straight  line.  The  criteria  for  the  best  fit  line  is 
that  the  sum  of  the  squared  errors  (SSE)  is  minimized,  that  is  made  as  small  as  possible.  Any  other  line  you 
might  choose  would  have  a  higher  SSE  than  the  best  fit  line.  This  best  fit  line  is  called  the  least  squares 
regression  line  . 

NOTE:  Computer  spreadsheets,  statistical  software,  and  many  calculators  can  quickly  calculate  the 
best  fit  line  and  create  the  graphs.  The  calculations  tend  to  be  tedious  if  done  by  hand.  Instructions 
to  use  the  TI-83,  TI-83+,  and  TI-84+  calculators  to  find  the  best  fit  line  and  create  a  scatterplot  are 
shown  at  the  end  of  this  section. 

THIRD  EXAM  vs  FINAL  EXAM  EXAMPLE: 

The  graph  of  the  line  of  best  fit  for  the  third  exam  /  final  exam  example  is  shown  below: 


Figure  12.9 


The  least  squares  regression  line  (best  fit  line)  for  the  third  exam/ final  exam  example  has  the  equation: 

A 

y= -173.51 +  4.83X  (12.5) 
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NOTE: 

Remember,  it  is  always  important  to  plot  a  scatter  diagram  first.  If  the  scatter  plot  indicates  that 
there  is  a  linear  relationship  between  the  variables,  then  it  is  reasonable  to  use  a  best  fit  line 
to  make  predictions  for  y  given  x  within  the  domain  of  ^c-values  in  the  sample  data,  but  not 
necessarily  for  x-values  outside  that  domain. 

You  could  use  the  line  to  predict  the  final  exam  score  for  a  student  who  earned  a  grade  of  73  on 
the  third  exam. 

You  should  NOT  use  the  Une  to  predict  the  final  exam  score  for  a  student  who  earned  a  grade  of 
50  on  the  third  exam,  because  50  is  not  within  the  domain  of  the  x-values  in  the  sample  data, 
which  are  between  65  and  75. 

UNDERSTANDING  SLOPE 

The  slope  of  the  line,  b,  describes  how  changes  in  the  variables  are  related.  It  is  important  to  interpret 
the  slope  of  the  line  in  the  context  of  the  situation  represented  by  the  data.  You  should  be  able  to  write  a 
sentence  interpreting  the  slope  in  plain  English. 

INTERPRETATION  OF  THE  SLOPE:  The  slope  of  the  best  fit  line  teUs  us  how  the  dependent  variable  (y) 
changes  for  every  one  unit  increase  in  the  independent  (x)  variable,  on  average. 

THIRD  EXAM  vs  FINAL  EXAM  EXAMPLE 

Slope:  The  slope  of  the  line  is  b  =  4.83. 

Interpretation:  For  a  one  point  increase  in  the  score  on  the  third  exam,  the  final  exam  score  increases  by 
4.83  points,  on  average. 


12.5.2  Using  the  TI-83+  and  TI-84+  Calculators 

Using  the  Linear  Regression  T  Test:  LinRegTTest 

Step  1.  In  the  STAT  list  editor,  enter  the  X  data  in  list  LI  and  the  Y  data  in  list  L2,  paired  so  that  the  corre- 
sponding (x,y)  values  are  next  to  each  other  in  the  lists.  (If  a  particular  pair  of  values  is  repeated,  enter 
it  as  many  times  as  it  appears  in  the  data.) 

Step  2.  On  the  STAT  TESTS  menu,  scroll  down  with  the  cursor  to  select  the  LinRegTTest.  (Be  careful  to  select 
LinRegTTest  as  some  calculators  may  also  have  a  different  item  called  LinRegTInt.) 

Step  3.  On  the  LinRegTTest  input  screen  enter:  Xlist:  LI ;  Ylist:  L2  ;  Freq:  1 

Step  4.  On  the  next  line,  at  the  prompt  /3  or  p,  highlight  "7^  0"  and  press  ENTER 

Step  5.  Leave  the  line  for  "RegEq:"  blank 

Step  6.  Highlight  Calculate  and  press  ENTER. 
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LinRegTTest  Input  Screen  and  Output  Screen 


LinRegTTest 


LinRegTTest 

y  =  a  +  bx 


Xlist:  LI 
Yliat:  L2 
Freq:  1 


t  =  2.657560155 
p  =  .0261501512 

df  =  9 


RegEO: 
Calculate 


4,a  =  -173.513363 
b  =  4.B2739420g 
s=  16.41237711 
r^=  .4396931104 
r=  .663093591 


TI-83+  and  TI-84+ 
calculators 


Figure 


12.10 


The  output  screen  contains  a  lot  of  information.  For  now  we  will  focus  on  a  few  items  from  the  output,  and 
will  return  later  to  the  other  items. 

The  second  line  says  y=a+bx.  Scroll  down  to  find  the  values  a=-173.513,  and  b=4.8273 ;  the  equation  of  the 


best  fit  line  is  i/=  -173.51  +  4.83x 
The  two  items  at  the  bottom  are     =  .43969  and  r=.663.  For  now,  just  note  where  to  find  these  values;  we 
will  discuss  them  in  the  next  two  sections. 

Graphing  the  Scatterplot  and  Regression  Line 

Step  1.  We  are  assuming  your  X  data  is  already  entered  in  list  LI  and  your  Y  data  is  in  list  L2 

Step  2.  Press  2nd  STATPLOT  ENTER  to  use  Plot  1 

Step  3.  On  the  input  screen  for  PLOT  1,  highlight  On  and  press  ENTER 

Step  4.  For  TYPE:  highlight  the  very  first  icon  which  is  the  scatterplot  and  press  ENTER 

Step  5.  Indicate  Xlist:  LI  and  Ylist:  L2 

Step  6.  For  Mark:  it  does  not  matter  which  symbol  you  highlight. 

Step  7.  Press  the  ZOOM  key  and  then  the  number  9  (for  menu  item  "ZoomStat") ;  the  calculator  will  fit  the 
window  to  the  data 

Step  8.  To  graph  the  best  fit  line,  press  the  "Y="  key  and  t5^e  the  equation  -173.5+4.83X  into  equation  Yl. 

(The  X  key  is  immediately  left  of  the  STAT  key).  Press  ZOOM  9  again  to  graph  it. 
Step  9.  Optional:  If  you  want  to  change  the  viewing  window,  press  the  WINDOW  key.  Enter  your  desired 

window  using  Xmin,  Xmax,  Ymin,  Ymax 

**With  contributions  from  Roberta  Bloom 
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12.6  Correlation  Coefficient  and  Coefficient  of  Determination*' 
12.6.1  The  Correlation  Coefficient  r 

Besides  looking  at  the  scatter  plot  and  seeing  that  a  line  seems  reasonable,  how  can  you  tell  if  the  line  is  a 
good  predictor?  Use  the  correlation  coefficient  as  another  indicator  (besides  the  scatterplot)  of  the  strength 
of  the  relationship  between  x  and  y. 

The  correlation  coefficient,  r,  developed  by  Karl  Pearson  in  the  early  1900s,  is  a  numerical  measure  of  the 
strength  of  association  between  the  independent  variable  x  and  the  dependent  variable  y. 

The  correlation  coefficient  is  calculated  as 


where  n  =  the  number  of  data  points. 

If  you  suspect  a  linear  relationship  between  x  and  y,  then  r  can  measure  how  strong  the  linear  relationship 
is. 

What  the  VALUE  of  r  tells  us: 

•  The  value  of  r  is  always  between  -1  and  +1:— l<r<l. 

•  The  size  of  the  correlation  r  indicates  the  strength  of  the  linear  relationship  between  x  and  y.  Values 
of  r  close  to  -1  or  to  +1  indicate  a  stronger  linear  relationship  between  x  and  y. 

•  If  r  =  0  there  is  absolutely  no  linear  relationship  between  x  and  y  (no  linear  correlation). 

•  Ur  —  1,  there  is  perfect  positive  correlation.  If  r  =  —  1,  there  is  perfect  negative  correlation.  In  both 
these  cases,  all  of  the  original  data  points  lie  on  a  straight  line.  Of  course,  in  the  real  world,  this  wiU 
not  generally  happen. 

What  the  SIGN  of  r  tells  us 

•  A  positive  value  of  r  means  that  when  x  increases,  y  tends  to  increase  and  when  x  decreases,  y  tends 
to  decrease  (positive  correlation). 

•  A  negative  value  of  r  means  that  when  x  increases,  y  tends  to  decrease  and  when  x  decreases,  y  tends 
to  increase  (negative  correlation). 

•  The  sign  of  r  is  the  same  as  the  sign  of  the  slope,  b,  of  the  best  fit  line. 

NOTE:  Strong  correlation  does  not  suggest  that  x  causes  y  or  y  causes  x.  We  say  "correlation  does 
not  imply  causation."  For  example,  every  person  who  learned  math  in  the  17th  century  is  dead. 
However,  learning  math  does  not  necessarily  cause  death! 


*This  content  is  available  online  at  <http:/ / cnx.org/content/ ml7092/1.12/>. 


r  = 


n  ■  Ex  ■  y  -  (Ex)  ■  (Ey) 


(12.6) 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


535 


(a)  Positive  Correlation  (b)  Negative  Correlation  (c)  Zero  Correlation 

Figure  12.11:  (a)  A  scatter  plot  showing  data  with  a  positive  correlation.  0  <  r  <  1  (b)  A  scatter  plot 
showing  data  with  a  negative  correlation.  —1  <  r  <  0  (c)  A  scatter  plot  showing  data  with  zero  correlation. 

r=0 


The  formula  for  r  looks  formidable.  However,  computer  spreadsheets,  statistical  software,  and  many  cal- 
culators can  quickly  calculate  r.  The  correlation  coefficient  r  is  the  bottom  item  in  the  output  screens  for  the 
LinRegTTest  on  the  TI-83,  TI-83+,  or  TI-84+  calculator  (see  previous  section  for  instructions). 

12.6.2  The  Coefficient  of  Determination 

is  called  the  coefficient  of  determination,  is  the  square  of  the  correlation  coefficient ,  but  is  usually 
stated  as  a  percent,  rather  than  in  decimal  form,     has  an  interpretation  in  the  context  of  the  data: 

•  r^,  when  expressed  as  a  percent,  represents  the  percent  of  variation  in  the  dependent  variable  y  that 
can  be  explained  by  variation  in  the  independent  variable  x  using  the  regression  (best  fit)  line. 

•  l-T'^,  when  expressed  as  a  percent,  represents  the  percent  of  variation  in  y  that  is  NOT  explained  by 
variation  in  x  using  the  regression  line.  This  can  be  seen  as  the  scattering  of  the  observed  data  points 
about  the  regression  line. 

Consider  the  third  exam/final  exam  example  introduced  in  the  previous  section 

A 

The  line  of  best  fit  is:  !/=  -173.51  +  4.83x 
The  correlation  coefficient  is  r  =  0.6631 
The  coefficient  of  determination  is     =  0.6631^  =  0.4397 
Interpretation  of     in  the  context  of  this  example: 

Approximately  44%  of  the  variation  (0.4397  is  approximately  0.44)  in  the  final  exam  grades  can  be  ex- 
plained by  the  variation  in  the  grades  on  the  third  exam,  using  the  best  fit  regression  line. 

Therefore  approximately  56%  of  the  variation  (1  -  0.44  =  0.56)  in  the  final  exam  grades  can  NOT  be  ex- 
plained by  the  variation  in  the  grades  on  the  third  exam,  using  the  best  fit  regression  line.  (This  is 
seen  as  the  scattering  of  the  points  about  the  line.) 

**With  contributions  from  Roberta  Bloom. 
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12.7  Testing  the  Significance  of  the  Correlation  Coefficient^ 
12.7.1  Testing  the  Significance  of  the  Correlation  Coefficient 

The  correlation  coefficient,  r,  tells  us  about  the  strength  of  the  linear  relationship  between  x  and  y.  However, 
the  reliability  of  the  linear  model  also  depends  on  how  many  observed  data  points  are  in  the  sample.  We 
need  to  look  at  both  the  value  of  the  correlation  coefficient  r  and  the  sample  size  n,  together. 

We  perform  a  hypothesis  test  of  the  "significance  of  the  correlation  coefficient"  to  decide  whether  the 
linear  relationship  in  the  sample  data  is  strong  enough  to  use  to  model  the  relationship  in  the  population. 

The  sample  data  is  used  to  compute  r,  the  correlation  coefficient  for  the  sample.  If  we  had  data  for  the 
entire  population,  we  could  find  the  population  correlation  coefficient.  But  because  we  only  have  sample 
data,  we  can  not  calculate  the  population  correlation  coefficient.  The  sample  correlation  coefficient,  r,  is  our 
estimate  of  the  unknown  population  correlation  coefficient. 

The  symbol  for  the  population  correlation  coefficient  is  p,  the  Greek  letter  "rho". 

p  -  population  correlation  coefficient  (unknown) 

r  =  sample  correlation  coefficient  (known;  calciilated  from  sample  data) 

The  hypothesis  test  lets  us  decide  whether  the  value  of  the  population  correlation  coefficient  p  is  "close  to 
0"  or  "significantly  different  from  0".  We  decide  this  based  on  the  sample  correlation  coefficient  r  and  the 
sample  size  n. 

If  the  test  concludes  that  the  correlation  coejfficient  is  significantly  dijfferent  from  0,  we  say  that  the 
correlation  coefficient  is  "significant". 

•  Conclusion:  "There  is  sufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship 
between  x  and  y  because  the  correlation  coefficient  is  significantly  different  from  0." 

•  What  the  conclusion  means:  There  is  a  significant  linear  relationship  between  x  and  y.  We  can  use  the 
regression  line  to  model  the  linear  relationship  between  x  and  y  in  the  population. 

If  the  test  concludes  that  the  correlation  coefficient  is  not  significantly  different  from  0  (it  is  close  to  0), 
we  say  that  correlation  coefficient  is  "not  significant". 

•  Conclusion:  "There  is  insufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship 
between  x  and  y  because  the  correlation  coefficient  is  not  significantly  different  from  0." 

•  What  the  conclusion  means:  There  is  not  a  significant  linear  relationship  between  x  and  y.  Therefore 
we  can  NOT  use  the  regression  line  to  model  a  linear  relationship  between  x  and  y  in  the  popiilation. 

NOTE: 

•  If  r  is  significant  and  the  scatter  plot  shows  a  linear  trend,  the  line  can  be  used  to  predict  the 
value  of  y  for  values  of  x  that  are  within  the  domain  of  observed  x  values. 

•  If  r  is  not  significant  OR  if  the  scatter  plot  does  not  show  a  linear  trend,  the  line  should  not  be 

used  for  prediction. 

•  If  r  is  significant  and  if  the  scatter  plot  shows  a  linear  trend,  the  line  may  NOT  be  appropriate 
or  reliable  for  prediction  OUTSIDE  the  domain  of  observed  x  values  in  the  data. 

PERFORMING  THE  HYPOTHESIS  TEST 
SETTING  UP  THE  HYPOTHESES: 

•  Null  Hypothesis:  Hg:  p  =  0 

^This  content  is  available  online  at  <http: / /cnx.org/content/ml7077/1.15/>. 
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•  Alternate  Hypothesis:  Hat  p  ^  0 

What  the  hypotheses  mean  in  words: 

•  Null  Hypothesis  Hgi  The  population  correlation  coefficient  IS  NOT  significantly  different  from  0. 
There  IS  NOT  a  significant  linear  relationship(correlation)  between  x  and  y  in  the  population. 

•  Alternate  Hypothesis  Ha:  The  population  correlation  coefficient  IS  significantly  DIFFERENT  FROM 
0.  There  IS  A  SIGNIFICANT  LINEAR  RELATIONSHIP  (correlation)  between  ;c  and  y  in  the  popula- 
tion. 

DRAWING  A  CONCLUSION: 

There  are  two  methods  to  make  the  decision.  Both  methods  are  eqmvalent  and  give  the  same  result. 

Method  1:  Using  the  p-value 

Method  2:  Using  a  table  of  critical  values 

In  this  chapter  of  this  textbook,  we  will  always  use  a  significance  level  of  5%,  a  =  0.05 
Note:  Using  the  p-value  method,  you  could  choose  any  appropriate  significance  level  you  want;  you  are 
not  limited  to  using  a.  =  0.05.  But  the  table  of  critical  values  provided  in  this  textbook  assumes  that 
we  are  using  a  significance  level  of  5%,  a  =  0.05.  (If  we  wanted  to  use  a  different  significance  level 
than  5%  with  the  critical  value  method,  we  would  need  different  tables  of  critical  values  that  are  not 
provided  in  this  textbook.) 

METHOD  1:  Using  a  p-value  to  make  a  decision 

The  linear  regression  f-test  LinRegTTEST  on  the  TI-83+  or  TI-84+  calculators  calciilates  the  p-value. 
On  the  LinRegTTEST  input  screen,  on  the  line  prompt  for  j6  or  p,  highlight  "7^  0" 
The  output  screen  shows  the  p-value  on  the  line  that  reads  "p  =". 
(Most  computer  statistical  software  can  calculate  the  p-value.) 

If  the  p-value  is  less  than  the  significance  level  (a  =  0.05): 

•  Decision:  REJECT  the  null  h5rpothesis. 

•  Conclusion:  "There  is  sufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship 
between  x  and  y  because  the  correlation  coefficient  is  significantly  different  from  0." 

If  the  p-value  is  NOT  less  than  the  significance  level  (a  =  0.05) 

•  Decision:  DO  NOT  REJECT  the  niiU  hypothesis. 

•  Conclusion:  "There  is  insufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship 
between  x  and  y  because  the  correlation  coefficient  is  NOT  significantly  different  from  0." 

Calculation  Notes: 

You  wiU  use  technology  to  calculate  the  p-value.  The  following  describe  the  calculations  to  compute  the 

test  statistics  and  the  p-value: 
The  p-value  is  calculated  using  a  t-distribution  with  n  —  2  degrees  of  freedom. 

The  formula  for  the  test  statistic  is  t  =  The  value  of  the  test  statistic,  t,  is  shown  in  the  computer 

or  calculator  output  along  with  the  p-value.  The  test  statistic  t  has  the  same  sign  as  the  correlation 
coefficient  r. 

The  p-value  is  the  combined  area  in  both  tails. 

An  alternative  way  to  calculate  the  p-value  (p)  given  by  LinRegTTest  is  the  command  2*tcdf(abs(t),10'^99, 
n-2)  in  2nd  DISTR. 

THIRD  EXAM  vs  FINAL  EXAM  EXAMPLE:  p  value  method 
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•  Consider  the  third  exam  /  final  exam  example. 

A 

•  The  line  of  best  fit  is:  y=  —173.51  +  4.83x  with  r  =  0.6631  and  there  are  n  =  11  data  points. 

•  Can  the  regression  line  be  used  for  prediction?  Given  a  third  exam  score  (x  value),  can  we  use  the 
line  to  predict  the  final  exam  score  (predicted  y  value)? 

Ho:p  =  0 
K  =  0.05 

The  p-value  is  0.026  (from  LinRegTTest  on  your  calculator  or  from  computer  software) 
The  p-value,  0.026,  is  less  than  the  significance  level  of  a  =  0.05 
Decision:  Reject  the  Null  H5^othesis  Hg 

Conclusion:  There  is  sufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship  between 

X  and  y  because  the  correlation  coefficient  is  significantly  different  from  0. 
Because  r  is  significant  and  the  scatter  plot  shows  a  linear  trend,  the  regression  line  can  be  used  to 

predict  final  exam  scores. 

METHOD  2:  Using  a  table  of  Critical  Values  to  make  a  decision 

The  95%  Critical  Values  of  the  Sample  Correlation  Coefficient  Table  (Section  12.10)  at  the  end  of  this 
chapter  (before  the  Summary  (Section  12.11))  may  be  used  to  give  you  a  good  idea  of  whether  the  com- 
puted value  of  r  is  significant  or  not.  Compare  r  to  the  appropriate  critical  value  in  the  table.  If  r  is  not 
between  the  positive  and  negative  critical  values,  then  the  correlation  coefficient  is  significant.  If  r  is  signif- 
icant, then  you  may  want  to  use  the  line  for  prediction. 

Example  12.7 

Suppose  you  computed  r  =  0.801  using  n  =  10  data  points,  df  =  n  —  2  =  10  —  2  =  8.  The 
critical  values  associated  with  df  =  8  are  -0.632  and  +  0.632.  If  r<  negative  critical  value  or  r  > 
positive  critical  value,  then  r  is  significant.  Since  r  =  0.801  and  0.801  >  0.632,  r  is  significant  and 
the  line  may  be  used  for  prediction.  If  you  view  this  example  on  a  number  line,  it  will  help  you. 

[  ] 

-1  0        +a632  TasoF+i 

Figure  12.12:  r  is  not  significant  between  -0.632  and  +0.632.  r  =  0.801  >  -|-  0.632.  Therefore,  r  is  significant. 


Example  12.8 

Suppose  you  computed  r  =  —0.624  with  14  data  points,  df  =  14  —  2  =  12.  The  critical  values  are 
-0.532  and  0.532.  Since  —0.624<— 0.532,  r  is  significant  and  the  line  may  be  used  for  prediction 

-0.624  -0.532  +0532 


Figure  12.13:  r  =  —0.624<— 0.532.  Therefore,  ris  significant. 
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Example  12.9 

Suppose  you  computed  r  =  0.776  and  n  =  6.  df  =  6  —  2  =  4.  The  critical  values  are  -0.811 
and  0.811.  Since  — 0.811  <  0.776  <  0.811,  r  is  not  significant  and  the  line  should  not  be  used  for 
prediction. 

07776  asTi 


Figure  12.14:  -0.811<r  =  0.776<0.811.  Therefore,  r  is  not  significant. 


THIRD  EXAM  vs  FINAL  EXAM  EXAMPLE:  critical  value  method 

•  Consider  the  third  exam  /  final  exam  example. 

A 

•  The  line  of  best  fit  is:  y=  —173.51  +  4.83x  with  r  =  0.6631  and  there  are  n  =  11  data  points. 

•  Can  the  regression  line  be  used  for  prediction?  Given  a  third  exam  score  (x  value),  can  we  use  the 
line  to  predict  the  final  exam  score  (predicted  y  value)? 

H„:  p  =  0 
DC  =  0.05 

Use  the  "95%  Critical  Value"  table  for  r  with  df  =  n-  2  =  ll-2  =  9 
The  critical  values  are  -0.602  and  +0.602 
Since  0.6631  >  0.602,  r  is  significant. 
Decision:  Reject  Hg-. 

Conclusion:There  is  sufficient  evidence  to  conclude  that  there  is  a  significant  linear  relationship  between 

X  and  y  because  the  correlation  coefficient  is  significantly  different  from  0. 
Because  r  is  significant  and  the  scatter  plot  shows  a  linear  trend,  the  regression  line  can  be  used  to 

predict  final  exam  scores. 

Example  12.10:  Additional  Practice  Examples  using  Critical  Values 

Suppose  you  computed  the  following  correlation  coefficients.  Using  the  table  at  the  end  of  the 
chapter,  determine  if  r  is  significant  and  the  line  of  best  fit  associated  with  each  r  can  be  used  to 
predict  a  y  value.  If  it  helps,  draw  a  number  line. 

1.  r  =  —0.567  and  the  sample  size,  n,  is  19.  The  di  —  n  —  2  —  17.  The  critical  value  is  -0.456. 
—0.567<— 0.456  so  r  is  significant. 

2.  r  =  0.708  and  the  sample  size,  n,  is  9.  The  df  =  n  —  2  =  7.  The  critical  value  is  0.666. 
0.708  >  0.666  so  r  is  significant. 

3.  r  =  0.134  and  the  sample  size,  n,  is  14.  The  df  =  14  —  2  =  12.  The  critical  value  is  0.532. 
0.134  is  between  -0.532  and  0.532  so  r  is  not  significant. 

4.  r  =  0  and  the  sample  size,  n,  is  5.  No  matter  what  the  dfs  are,  r  =  0  is  between  the  two 
critical  values  so  r  is  not  significant. 
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12.7.2  Assumptions  in  Testing  the  Significance  of  the  Correlation  Coefficient 

Testing  the  significance  of  the  correlation  coefficient  requires  that  certain  assiimptions  about  the  data  are 
satisfied.  The  premise  of  this  test  is  that  the  data  are  a  sample  of  observed  points  taken  from  a  larger 
population.  We  have  not  examined  the  entire  population  because  it  is  not  possible  or  feasible  to  do  so.  We 
are  examining  the  sample  to  draw  a  conclusion  about  whether  the  linear  relationship  that  we  see  between 
X  and  y  in  the  sample  data  provides  strong  enough  evidence  so  that  we  can  conclude  that  there  is  a  Unear 
relationship  between  x  and  y  in  the  population. 

The  regression  Hne  equation  that  we  calculate  from  the  sample  data  gives  the  best  fit  line  for  our  particiilar 
sample.  We  want  to  use  this  best  fit  line  for  the  sample  as  an  estimate  of  the  best  fit  line  for  the  population. 
Examining  the  scatterplot  and  testing  the  significance  of  the  correlation  coefficient  helps  us  determine  if  it 
is  appropriate  to  do  this. 

The  assumptions  underlying  the  test  of  significance  are: 

•  There  is  a  linear  relationship  in  the  population  that  models  the  average  value  of  y  for  varying  values 
of  X.  In  other  words,  the  expected  value  of  y  for  each  particular  value  lies  on  a  straight  line  in  the 
population.  (We  do  not  know  the  equation  for  the  line  for  the  population.  Our  regression  line  from 
the  sample  is  our  best  estimate  of  this  line  in  the  population.) 

•  The  y  values  for  any  particular  x  value  are  normally  distributed  about  the  line.  This  implies  that 
there  are  more  y  values  scattered  closer  to  the  line  than  are  scattered  farther  away.  Assumption  (1) 
above  implies  that  these  normal  distributions  are  centered  on  the  line:  the  means  of  these  normal 
distributions  of  y  values  lie  on  the  line. 

•  The  standard  deviations  of  the  population  y  values  about  the  line  are  equal  for  each  value  of  x.  In 
other  words,  each  of  these  normal  distiibutions  of  y  values  has  the  same  shape  and  spread  about  the 
line. 

•  The  residual  errors  are  mutually  independent  (no  pattern). 


Figure  12.15:  The  y  values  for  each  x  value  are  normally  distributed  about  the  line  with  the  same  standard 
deviation.  For  each  x  value,  the  mean  of  the  y  values  lies  on  the  regression  Une.  More  y  values  lie  near  the 
line  than  are  scattered  further  away  from  the  line. 


'With  contributions  from  Roberta  Bloom 
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12.8  Prediction' 

Recall  the  third  exam/ final  exam  example. 

We  examined  the  scatterplot  and  showed  that  the  correlation  coefficient  is  significant.  We  found  the  equa- 
tion of  the  best  fit  line  for  the  final  exam  grade  as  a  fimction  of  the  grade  on  the  third  exam.  We  can  now 
use  the  least  squares  regression  line  for  prediction. 

Suppose  you  want  to  estimate,  or  predict,  the  final  exam  score  of  statistics  students  who  received  73  on  the 
third  exam.  The  exam  scores  (^c-values)  range  from  65  to  75.  Since  73  is  between  the  ^c-values  65  and  75, 
substitute  x  —  Ti  into  the  equation.  Then: 

A 

y=  -173.51  +  4.83  (73)  =  179.08  (12.8) 

We  predict  that  statistic  students  who  earn  a  grade  of  73  on  the  third  exam  will  earn  a  grade  of  179.08  on 
the  final  exam,  on  average. 

Example  12.11 

Recall  the  third  exam/ final  exam  example. 
Problem  1 

What  would  you  predict  the  final  exam  score  to  be  for  a  student  who  scored  a  66  on  the  third 
exam? 

Solution 

145.27 


Problem  2  (Solution  on  p.  579.) 

What  would  you  predict  the  final  exam  score  to  be  for  a  student  who  scored  a  90  on  the  third 
exam? 

**With  contributions  from  Roberta  Bloom 

12.9  Outliers' 

In  some  data  sets,  there  are  values  (observed  data  points)  called  outliers.  Outliers  are  observed  data 
points  that  are  far  from  the  least  squares  line.  They  have  large  "errors",  where  the  "error"  or  residual  is  the 
vertical  distance  from  the  line  to  the  point. 

OutHers  need  to  be  examined  closely.  Sometimes,  for  some  reason  or  another,  they  shoiild  not  be  included 

in  the  analysis  of  the  data.  It  is  possible  that  an  outlier  is  a  result  of  erroneous  data.  Other  times,  an  outlier 
may  hold  valuable  information  about  the  population  under  study  and  should  remain  included  in  the  data. 
The  key  is  to  carefully  examine  what  causes  a  data  point  to  be  an  outlier. 

Besides  outliers,  a  sample  may  contain  one  or  a  few  points  that  are  called  influential  points.  Influential 
points  are  observed  data  points  that  are  far  from  the  other  observed  data  points  in  the  horizontal  direction. 
These  points  may  have  a  big  effect  on  the  slope  of  the  regression  line.  To  begin  to  identify  an  influential 
point,  you  can  remove  it  from  the  data  set  and  see  if  the  slope  of  the  regression  line  is  changed  significantly. 


*This  content  is  available  online  at  <http://cnx.org/content/ml7095/1.8/>. 
^This  content  is  available  online  at  <http://cnx.org/content/ml7094/1.14/>. 
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Computers  and  many  calculators  can  be  used  to  identify  outliers  from  the  data.  Computer  output  for 
regression  analysis  will  often  identify  both  outliers  and  influential  points  so  that  you  can  examine  them. 

Identifying  Outliers 

We  could  guess  at  outliers  by  looking  at  a  graph  of  the  scatterplot  and  best  fit  line.  However  we  would  like 
some  guideline  as  to  how  far  away  a  point  needs  to  be  in  order  to  be  considered  an  outlier  As  a  rough  rule 
of  thumb,  we  can  flag  any  point  that  is  located  further  than  two  standard  deviations  above  or  below  the 
best  fit  line  as  an  outlier.  The  standard  deviation  used  is  the  standard  deviation  of  the  residuals  or  errors. 

We  can  do  this  visually  in  the  scatterplot  by  drawing  an  extra  pair  of  lines  that  are  two  standard  deviations 
above  and  below  the  best  fit  line.  Any  data  points  that  are  outside  this  extra  pair  of  lines  are  flagged  as 
potential  outliers.  Or  we  can  do  this  numerically  by  calculating  each  residual  and  comparing  it  to  twice  the 
standard  deviation.  On  the  TI-83,  83+,  or  84+,  the  graphical  approach  is  easier.  The  graphical  procediure 
is  shown  first,  followed  by  the  numerical  calciilations.  You  would  generally  only  need  to  use  one  of  these 
methods. 

Example  12.12 

In  the  third  exam/ final  exam  example,  you  can  determine  if  there  is  an  outlier  or  not.  If  there  is 
an  outlier,  as  an  exercise,  delete  it  and  fit  the  remaining  data  to  a  new  line.  For  this  example,  the 
new  line  ought  to  fit  the  remaining  data  better.  This  means  the  SSE  should  be  smaller  and  the 
correlation  coefficient  ought  to  be  closer  to  1  or  -1. 

Solution 

Graphical  Identification  of  Outliers 

With  the  Tl-83,83+,84+  graphing  calculators,  it  is  easy  to  identify  the  outlier  graphically  and  visu- 
ally. If  we  were  to  measure  the  vertical  distance  from  any  data  point  to  the  corresponding  point 
on  the  line  of  best  fit  and  that  distance  was  equal  to  2s  or  farther,  then  we  would  consider  the  data 
point  to  be  "too  far"  from  the  line  of  best  fit.  We  need  to  find  and  graph  the  lines  that  are  two 
standard  deviations  below  and  above  the  regression  line.  Any  points  that  are  outside  these  two 
lines  are  outliers.  We  wiU  call  these  lines  Y2  and  Y3: 

As  we  did  with  the  equation  of  the  regression  line  and  the  correlation  coefficient,  we  will  use 
technology  to  calculate  this  standard  deviation  for  us.  Using  the  LinRegTTest  with  this  data, 
scroll  down  through  the  output  screens  to  find  s=16.412 

Line  Y2=-173.5+4.83x-2(16.4)  and  line  Y3=-173.5+4.83x+2(16.4) 

A 

where  ]/=-173.5+4.83x  is  the  line  of  best  fit.   Y2  and  Y3  have  the  same  slope  as  the  line  of 
best  fit. 

Graph  the  scatterplot  with  the  best  fit  line  in  equation  Yl,  then  enter  the  two  extra  lines  as  Y2  and 
Y3  in  the  "Y="equation  editor  and  press  ZOOM  9.  You  will  find  that  the  only  data  point  that  is  not 
between  lines  Y2  and  Y3  is  the  point  x=65,  y=175.  On  the  calculator  screen  it  is  just  barely  outside 
these  lines.  The  outlier  is  the  student  who  had  a  grade  of  65  on  the  third  exam  and  175  on  the  final 
exam;  this  point  is  further  than  2  standard  deviations  away  from  the  best  fit  line. 

Sometimes  a  point  is  so  close  to  the  lines  used  to  flag  outliers  on  the  graph  that  it  is  difficult  to  tell 
if  the  point  is  between  or  outside  the  lines.  On  a  computer,  enlarging  the  graph  may  help;  on  a 
small  calciilator  screen,  zooming  in  may  make  the  graph  clearer.  Note  that  when  the  graph  does 
not  give  a  clear  enough  picture,  you  can  use  the  numerical  comparisons  to  identify  outliers. 
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Figure  12.16 


Numerical  Identification  of  Outliers 

In  the  table  below,  the  first  two  columns  are  the  third  exam  and  final  exam  data.   The  third 

A  A 

column  shows  the  predicted  y  values  calculated  from  the  line  of  best  fit:  y=-173.5+4.83x.  The 
residuals,  or  errors,  have  been  calculated  in  the  fourth  column  of  the  table:  observed  y  value  — 

A 

predicted  y  value  =  y—  y. 

A 

s  is  the  standard  deviation  of  all  the  y—  y=  e  values  where  n  =  the  total  number  of  data  points.  If 
each  residual  is  calculated  and  squared,  and  the  results  are  added,  we  get  the  SSE.  The  standard 
deviation  of  the  residuals  is  calculated  from  the  SSE  as: 

„  _  /sse" 

^  ^  V  n-2 

Rather  than  calculate  the  value  of  s  ourselves,  we  can  find  s  using  the  computer  or  calculator.  For 
this  example,  the  calculator  function  LinRegTTest  found  s  =  16.4  as  the  standard  deviation  of  the 
residuals  35;  -17;  16;  -6;  -19;  9;  3;  -1;  -10;  -9;  -1. 
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X 

y 

A 

y 

A 

y-  y 

65 

175 

140 

175  -  140  =  35 

67 

133 

150 

133  -  150  =  -17 

71 

185 

169 

185  -  169  =  16 

71 

163 

169 

163  -  169  =  -6 

66 

126 

145 

126  -  145  =  -19 

75 

198 

189 

198  -  189  =  9 

67 

153 

150 

153  -  150  =  3 

70 

163 

164 

163  -  164  =  -1 

71 

159 

169 

159  -  169  =  -10 

69 

151 

160 

151  -  160  =  -9 

69 

159 

160 

159  -  160  =  -1 

Table  12.1 


We  are  looking  for  all  data  points  for  which  the  residual  is  greater  than  2s=2(16.4)=32.8  or  less  than 
-32.8.  Compare  these  values  to  the  residuals  in  column  4  of  the  table.  The  only  such  data  point  is 
the  student  who  had  a  grade  of  65  on  the  third  exam  and  175  on  the  final  exam;  the  residual  for 
this  student  is  35. 

How  does  the  outlier  affect  the  best  fit  line? 

Numerically  and  graphically,  we  have  identified  the  point  (65,175)  as  an  outlier.  We  should  re- 
examine the  data  for  this  point  to  see  if  there  are  any  problems  with  the  data.  If  there  is  an  error 
we  should  fix  the  error  if  possible,  or  delete  the  data.  If  the  data  is  correct,  we  would  leave  it  in 
the  data  set.  For  this  problem,  we  will  suppose  that  we  examined  the  data  and  found  that  this 
outlier  data  was  an  error.  Therefore  we  will  continue  on  and  delete  the  outlier,  so  that  we  can 
explore  how  it  affects  the  results,  as  a  learning  experience. 

Compute  a  new  best-fit  line  and  correlation  coefficient  using  the  10  remaining  points: 

On  the  TI-83,  TI-83+,  TI-84+  calculators,  delete  the  outlier  from  LI  and  L2.  Using  the  LinRegTTest, 
the  new  line  of  best  fit  and  the  correlation  coefficient  are: 

A 

y=  -355.19  +  7.39x  and  r  =  0.9121 

The  new  line  with  r  —  0.9121  is  a  stronger  correlation  than  the  original  (r=0.6631)  because  r  — 
0.9121  is  closer  to  1.  This  means  that  the  new  line  is  a  better  fit  to  the  10  remaining  data  values. 
The  line  can  better  predict  the  final  exam  score  given  the  third  exam  score. 


Numerical  Identification  of  Outliers:  Calculating  s  and  Finding  Outliers  Manually 

If  you  do  not  have  the  function  LinRegTTest,  then  you  can  calculate  the  outlier  in  the  first  example  by 
doing  the  following. 

A 

First,  square  each  \y—  y  \  (See  the  TABLE  above): 
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The  squares  are  35^;  17^;  16^;  6^;  19^;  9^;  3^;  1^;  10^;  9^;  1^ 

A 

Then,  add  (sum)  all  the  \y—  y  \  squared  terms  using  the  formula 

11      /  A    \  ^        11  A 

.  E  I  |y;  -  y;  Ij   =.     e;2       (Rg^aU  that  y;  -  y,-  =  e;.) 

=  35^  +  17^  +  16^  +  62  +  19^  +  92  +  32  +  i2  _^      +  92  +  12 
=  2440  —  SSE.  The  resiilt,  SSE  is  the  Sum  of  Squared  Errors. 

A 

Next,  calculate  s,  the  standard  deviation  of  all  the  y—  V—  e  values  where  n  =  the  total  number  of  data 
points. 

The  calculation  is  s  =  ^  |z§ 

For  the  third  exam/ final  exam  problem,  s  —  ^       —  16.47 

Next,  multiply  s  by  1.9: 
(1.9)  •  (16.47)  =  31.29 

A 

31.29  is  almost  2  standard  deviations  away  from  the  mean  of  the  y—  y  values. 

If  we  were  to  measure  the  vertical  distance  from  any  data  point  to  the  corresponding  point  on  the  line  of 
best  fit  and  that  distance  is  at  least  1.9s,  then  we  woxild  consider  the  data  point  to  be  "too  far"  from  the  line 
of  best  fit.  We  call  that  point  a  potential  outlier. 

A 

For  the  example,  if  any  of  the  |y—  y  |  values  are  at  least  31.29,  the  corresponding  (x,y)  data  point  is  a 
potential  outlier. 

A 

For  the  third  exam/ final  exam  problem,  all  the  |y  —  y  | 's  are  less  than  31.29  except  for  the  first  one  which  is 
35. 

A 

35  >  31.29       That  is,  |y-  y  |  >  (1.9)  •  (s) 

A 

The  point  which  corresponds  to  |y—  y  |  =  35  is  (65, 175).  Therefore,  the  data  point  (65, 175)  is  a  potential 
outlier.  For  this  example,  we  will  delete  it.  (Remember,  we  do  not  always  delete  an  outlier.) 

The  next  step  is  to  compute  a  new  best-fit  line  using  the  10  remaining  points.  The  new  line  of  best 
fit  and  the  correlation  coefficient  are: 

A 

y=  -355.19  +  7.39x  and  r  =  0.9121 
Example  12.13 

Using  this  new  line  of  best  fit  (based  on  the  remaining  10  data  points),  what  would  a  student 
who  receives  a  73  on  the  third  exam  expect  to  receive  on  the  final  exam?  Is  this  the  same  as  the 
prediction  made  using  the  original  line? 
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Solution 

A 

Using  the  new  line  of  best  fit,  ]/=  -355.19  +  7.39(73)  =  184.28.  A  student  who  scored  73  points 
on  the  third  exam  would  expect  to  earn  184  points  on  the  final  exam. 

A 

The  original  line  predicted  y—  —173.51  +  4.83(73)  —  179.08  so  the  prediction  using  the 
new  line  with  the  outlier  eliminated  differs  from  the  original  prediction. 


Example  12.14 

{From  The  Consumer  Price  Indexes  Web  site)  The  Consumer  Price  Index  (CPl)  measures  the  aver- 
age change  over  time  in  the  prices  paid  by  urban  consumers  for  consumer  goods  and  services.  The 
CPI  affects  nearly  all  Americans  because  of  the  many  ways  it  is  used.  One  of  its  biggest  uses  is  as 
a  measure  of  inflation.  By  providing  information  about  price  changes  in  the  Nation's  economy  to 
government,  business,  and  labor,  the  CPl  helps  them  to  make  economic  decisions.  The  President, 
Congress,  and  the  Federal  Reserve  Board  use  the  CPl's  trends  to  formulate  monetary  and  fiscal 
policies.  In  the  following  table,  x  is  the  year  and  y  is  the  CPI. 

Data: 


X 

y 

1915 

10.1 

1926 

17.7 

1935 

13.7 

1940 

14.7 

1947 

24.1 

1952 

26.5 

1964 

31.0 

1969 

36.7 

1975 

49.3 

1979 

72.6 

1980 

82.4 

1986 

109.6 

1991 

130.7 

1999 

166.6 

Table  12.2 


Problem 

•  Make  a  scatterplot  of  the  data. 

A 

•  Calculate  the  least  squares  line.  Write  the  equation  in  the  form  y—  a  +  hx. 

•  Draw  the  line  on  the  scatterplot. 

•  Find  the  correlation  coefficient.  Is  it  significant? 

•  What  is  the  average  CPI  for  the  year  1990? 
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Solution 

•  Scatter  plot  and  line  of  best  fit. 

A 

•  y=  -3204  +  1.662x  is  the  equation  of  the  line  of  best  fit. 

•  r  =  0.8694 

•  The  number  of  data  points  is  n  =  14.  Use  the  95%  Critical  Values  of  the  Sample  Correlation 
Coefficient  table  at  the  end  of  Chapter  12.  n  —  2  =  12.  The  corresponding  critical  value  is 
0.532.  Since  0.8694  >  0.532,  r  is  significant. 

A 

•  y=  -3204  +  1.662  (1990)  =  103.4  CPl 

•  Using  the  calculator  LinRegTTest,  we  find  that  s  =  25.4 ;  graphing  the  lines  Y2=-3204+1.662X- 
2(25.4)  and  Y3=-3204+1.662X+2(25.4)  shows  that  no  data  values  are  outside  those  lines,  iden- 
tifying no  outliers.  (Note  that  the  year  1999  was  very  close  to  the  upper  line,  but  still  inside 
it.) 


Figure  12.17 


NOTE:  In  the  example,  notice  the  pattern  of  the  points  compared  to  the  line.  Although  the  correla- 
tion coefficient  is  significant,  the  pattern  in  the  scatterplot  indicates  that  a  curve  would  be  a  more 
appropriate  model  to  use  than  a  line.  In  this  example,  a  statistician  should  prefer  to  use  other 
methods  to  fit  a  curve  to  this  data,  rather  than  model  the  data  with  the  line  we  found.  In  addition 
to  doing  the  calculations,  it  is  always  important  to  look  at  the  scatterplot  when  deciding  whether 
a  linear  model  is  appropriate. 

If  you  are  interested  in  seeing  more  years  of  data,  visit  the  Bureau  of  Labor  Statistics  CPl  website 
ftp: /  / ftp.bls.gov/pub/ special. requests/ cpi/ cpiai.txt ;  our  data  is  taken  from  the  column  entitled 
"Annual  Avg."  (third  column  from  the  right).  For  example  you  could  add  more  current  years  of 
data.  Try  adding  the  more  recent  years  2004  :  CP1=188.9,  2008  :  CP1=215.3  and  2011:  CP1=224.9. 

A 

See  how  it  affects  the  model.  (Check:  y=  -4436  +  2.295x.  r  =  0.9018.  Is  r  significant?  Is  the  fit 
better  with  the  addition  of  the  new  points?) 

With  contributions  from  Roberta  Bloom 
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12.10  95%  Critical  Values  of  the  Sample  Correlation  Coefficient  Table  ° 


^"^This  content  is  available  online  at  <http://cnx.org/content/ml7098/1.6/>. 
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Degrees  of  Freedom:  n  —  2 

Critical  Values:  (+  and  — ) 

1 

0.997 

2 

0.950 

3 

0.878 

4 

0.811 

5 

0.754 

6 

0.707 

7 

0.666 

8 

0.632 

9 

0.602 

10 

0.576 

11 

0.555 

12 

0.532 

13 

0.514 

14 

0.497 

15 

0.482 

16 

0.468 

17 

0.456 

18 

0.444 

19 

0.433 

20 

0.423 

21 

0.413 

22 

0.404 

23 

0.396 

24 

0.388 

25 

0.381 

26 

0.374 

continued  on  next  page 
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27 

0.367 

28 

0.361 

29 

0.355 

30 

0.349 

40 

0.304 

50 

0.273 

60 

0.250 

70 

0.232 

80 

0.217 

90 

0.205 

100 

0.195 

Table  12.3 
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12.11  Summary" 

Bivariate  Data:  Each  data  point  has  two  values.  The  form  is  {x,y). 

A 

Line  of  Best  Fit  or  Least  Squares  Line  (LSL):  y=  a  +  hx 
X  =  independent  variable;  y  =  dependent  variable 

A 

Residual:  Actual  y  value  —  predicted  y  value  —  y—  y 
Correlation  Coefficient  r: 

1.  Used  to  determine  whether  a  line  of  best  fit  is  good  for  prediction. 

2.  Between  -1  and  1  inclusive.  The  closer  r  is  tol  or  -1,  the  closer  the  original  points  are  to  a  straight  line. 

3.  If  r  is  negative,  the  slope  is  negative.  If  r  is  positive,  the  slope  is  positive. 

4.  If  r  =  0,  then  the  line  is  horizontal. 

Sum  of  Squared  Errors  (SSE):  The  smaller  the  SSE,  the  better  the  original  set  of  points  fits  the  line  of  best 
fit. 

Outlier:  A  point  that  does  not  seem  to  fit  the  rest  of  the  data. 


This  content  is  available  online  at  <http://cnx.org/content/ml7081 /1.4/>. 
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12.12  Practice:  Linear  Regression^^ 

12.12.1  Student  Learning  Outcomes 

•  The  student  will  evaluate  bivariate  data  and  determine  if  a  line  is  an  appropriate  fit  to  the  data. 

12.12.2  Given 

Below  are  real  data  for  the  first  two  decades  of  AIDS  reporting.  {Source:  Centers  for  Disease  Control  and 
Prevention,  National  Center  for  HIV,  STD,  and  TB  Prevention) 

Adults  and  Adolescents  only.  United  States 


Year 

#  AIDS  cases  diagnosed 

#  AIDS  deaths 

Pre-1981 

91 

29 

1981 

319 

121 

1982 

1,170 

453 

1983 

3,076 

1,482 

1984 

6,240 

3,466 

1985 

11,776 

6,878 

1986 

19,032 

11,987 

1987 

28,564 

16,162 

1988 

35,447 

20,868 

1989 

42,674 

27,591 

1990 

48,634 

31,335 

1991 

59,660 

36,560 

1992 

78,530 

41,055 

1993 

78,834 

44,730 

1994 

71,874 

49,095 

1995 

68,505 

49,456 

1996 

59,347 

38,510 

1997 

47,149 

20,736 

1998 

38,393 

19,005 

1999 

25,174 

18,454 

2000 

25,522 

17,347 

2001 

25,643 

17,402 

2002 

26,464 

16,371 

Total 

802,118 

489,093 

Table  12.4 

^^This  content  is  available  onUne  at  <http://cnx.org/content/ml7088/1.12/>. 
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NOTE:  We  will  use  the  columns  "year"  and  "#  AIDS  cases  diagnosed"  for  all  questions  unless 
otherwise  stated. 


12.12.3  Graphing 

Graph  "year"  vs.  "#  AIDS  cases  diagnosed."  Plot  the  points  on  the  graph  located  below  in  the  section 
titled  "Plot"  .  Do  not  include  pre-1981.  Label  both  axes  with  words.  Scale  both  axes. 

12.12.4  Data 

Exercise  12.12.1 

Enter  your  data  into  your  calculator  or  computer.  The  pre-1981  data  should  not  be  included.  Why 
is  that  so? 


12.12.5  Linear  Equation 

Write  the  linear  equation  below,  rounding  to  4  decimal  places: 

NOTE:  For  any  prediction  questions,  the  answers  are  calculated  using  the  least  squares  (best  fit) 
line  equation  cited  in  the  solution. 

Exercise  12.12.2  (Solution  on  p.  579.) 

Calculate  the  following: 

a.  a  — 

h.h  = 

c.  corr.  = 

d.  n  =(#  of  pairs) 

Exercise  VlXl.?)  (Solution  on  p.  579.) 

A 

equation:  V— 


12.12.6  Solve 

Exercise  12.12.4  (Solution  on  p.  579.) 

Solve. 

A 

a.  When  x  =  1985,  y= 

A 

b.  When  x  =  1990,  y= 
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12.12.7  Plot 

Plot  the  2  above  points  on  the  graph  below.  Then,  connect  the  2  points  to  form  the  regression  line. 


Obtain  the  graph  on  your  calculator  or  computer. 

12.12.8  Discussion  Questions 

Look  at  the  graph  above. 
Exercise  12.12.5 

Does  the  line  seem  to  fit  the  data?  Why  or  why  not? 
Exercise  12.12.6 

Do  you  think  a  linear  fit  is  best?  Why  or  why  not? 
Exercise  12.12.7 

Hand  draw  a  smooth  ciirve  on  the  graph  above  that  shows  the  flow  of  the  data. 
Exercise  12.12.8 

What  does  the  correlation  imply  about  the  relationship  between  time  (years)  and  the  number  of 
diagnosed  AIDS  cases  reported  in  the  U.S.? 

Exercise  12.12.9 

Why  is  "year"  the  independent  variable  and  "#  AIDS  cases  diagnosed."  the  dependent  variable 
(instead  of  the  reverse)? 

Exercise  12.12.10  (Solution  on  p.  579.) 

Solve. 

A 

a.  When  x  =  1970,  y=: 

b.  Why  doesn't  this  answer  make  sense? 
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12.13  Homework" 

Exercise  12.13.1  (Solution  on  p.  579.) 

For  each  situation  below,  state  the  independent  variable  and  the  dependent  variable. 

a.  A  study  is  done  to  determine  if  elderly  drivers  are  involved  in  more  motor  vehicle  fatalities 

than  all  other  drivers.  The  number  of  fatalities  per  100,000  drivers  is  compared  to  the  age  of 
drivers. 

b.  A  study  is  done  to  determine  if  the  weekly  grocery  biU  changes  based  on  the  number  of  family 

members. 

c.  Insurance  companies  base  life  insurance  premiums  partially  on  the  age  of  the  applicant. 

d.  Utility  bills  vary  according  to  power  consumption. 

e.  A  study  is  done  to  determine  if  a  higher  education  reduces  the  crime  rate  in  a  population. 


NOTE:  For  any  prediction  questions,  the  answers  are  calculated  using  the  least  squares  (best  fit) 
line  equation  cited  in  the  solution. 

Exercise  12.13.2 

Recently,  the  annual  number  of  driver  deaths  per  100,000 
for  the  selected  age  groups  was  as  follows  (Source:  http:// 
htip://wwwxensiis.gov/compendia/statab/cats/transportation/motor_vehicle_accidents_and_fataU^ 

y. 


Age 

Number  of  Driver  Deaths  per  100,000 

16-19 

38 

20-24 

36 

25-34 

24 

35-54 

20 

55-74 

18 

75+ 

28 

Table  12.5 


a.  For  each  age  group,  pick  the  midpoint  of  the  interval  for  the  x  value.  (For  the  75+  group,  use 

80.) 

b.  Using  "ages"  as  the  independent  variable  and  "Number  of  driver  deaths  per  100,000"  as  the 

dependent  variable,  make  a  scatter  plot  of  the  data. 

A 

c.  Calculate  the  least  squares  (best-fit)  line.  Put  the  equation  in  the  form  of:  y—  a  +  bx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  Pick  two  ages  and  find  the  estimated  fatality  rates. 

f.  Use  the  two  points  in  (e)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

g.  Based  on  the  above  data,  is  there  a  linear  relationship  between  age  of  a  driver  and  driver  fatality 

rate? 

h.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

^■'This  content  is  available  online  at  <http://cnx.Org/content/ml7085/l.14/>. 

^*http:/ /www.census.gov/compendia/ statab/ cats/transportation/ motor_vehicle_accidents_and_fatalities.html 


Available  for  free  at  Connexions  <http://cnx.org/content/coll0522/1.40> 


556 


CHAPTER  12.  LINEAR  REGRESSION  AND  CORRELATION 


Exercise  12.13.3  (Solution  on  p.  579.) 

The  average  niraiber  of  people  in  a  family  that  received  welfare  for  various  years  is  given  below. 
(Source:  House  Ways  and  Means  Coimnitiee,  Health  and  Human  Services  Department) 


Year 

Welfare  family  size 

1969 

4.0 

1973 

3.6 

1975 

3.2 

1979 

3.0 

1983 

3.0 

1988 

3.0 

1991 

2.9 

Table  12.6 


a.  Using  "year"  as  the  independent  variable  and  "welfare  family  size"  as  the  dependent  variable, 

make  a  scatter  plot  of  the  data. 

A 

b.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y—  a  +  hx 

c.  Find  the  correlation  coefficient.  Is  it  significant? 

d.  Pick  two  years  between  1969  and  1991  and  find  the  estimated  welfare  family  sizes. 

e.  Use  the  two  points  in  (d)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

f .  Based  on  the  above  data,  is  there  a  linear  relationship  between  the  year  and  the  average  number 

of  people  in  a  welfare  family? 

g.  Using  the  least  squares  line,  estimate  the  welfare  family  sizes  for  1960  and  1995.  Does  the  least 

squares  line  give  an  accurate  estimate  for  those  years?  Explain  why  or  why  not. 

h.  Are  there  any  outliers  in  the  above  data? 

i.  What  is  the  estimated  average  welfare  family  size  for  1986?  Does  the  least  squares  line  give  an 

accurate  estimate  for  that  year?  Explain  why  or  why  not. 
j.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.4 

Use  the  AIDS  data  from  the  practice  for  this  section  (Section  12.12.2:  Given),  but  this  time  use  the 
columns  "year  #"  and  "#  new  AIDS  deaths  in  U.S."  Answer  all  of  the  questions  from  the  practice 
again,  using  the  new  colirams. 

Exercise  12.13.5  (Solution  on  p.  580.) 

The  height  (sidewalk  to  roof)  of  notable  tall  buildings  in  America  is  compared  to  the  niraiber  of 
stories  of  the  building  (beginning  at  street  level).  (Source:  Microsoft  Bookshelf) 
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Height  (in  feet) 

Stories 

1050 

57 

428 

28 

362 

26 

529 

40 

790 

60 

401 

22 

380 

38 

1454 

110 

1127 

100 

700 

46 

Table  12.7 


a.  Using  "stories"  as  the  independent  variable  and  "height"  as  the  dependent  variable,  make  a 

scatter  plot  of  the  data. 

b.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables? 

A 

c.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  V—  «  +  bx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  Find  the  estimated  heights  for  32  stories  and  for  94  stories. 

f.  Use  the  two  points  in  (e)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

g.  Based  on  the  above  data,  is  there  a  linear  relationship  between  the  number  of  stories  in  taU 

buildings  and  the  height  of  the  buildings? 

h.  Are  there  any  outliers  in  the  above  data?  If  so,  which  point(s)? 

i.  What  is  the  estimated  height  of  a  building  with  6  stories?  Does  the  least  squares  line  give  an 

accurate  estimate  of  height?  Explain  why  or  why  not. 
j.  Based  on  the  least  squares  Une,  adding  an  extra  story  is  predicted  to  add  about  how  many  feet 
to  a  building? 

k.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 
Exercise  12.13.6 

Below  is  the  life  expectancy  for  an  individual  bom  in  the  United  States  in  certain  years.  (Source: 
National  Center  for  Health  Statistics) 
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Year  of  Birth 

Life  Expectancy 

1930 

59.7 

1940 

62.9 

1950 

70.2 

1965 

69.7 

1973 

71.4 

1982 

74.5 

1987 

75 

1992 

75.7 

2010 

78.7 

Table  12.8 


a.  Decide  which  variable  shoiild  be  the  independent  variable  and  which  shoiild  be  the  dependent 

variable. 

b.  Draw  a  scatter  plot  of  the  ordered  pairs. 

A 

c.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y=  a  +  bx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  Find  the  estimated  life  expectancy  for  an  individual  born  in  1950  and  for  one  born  in  1982. 

f .  Why  aren't  the  answers  to  part  (e)  the  values  on  the  above  chart  that  correspond  to  those  years? 

g.  Use  the  two  points  in  (e)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Based  on  the  above  data,  is  there  a  linear  relationship  between  the  year  of  birth  and  life  ex- 

pectancy? 

i.  Are  there  any  outliers  in  the  above  data? 

j.  Using  the  least  squares  line,  find  the  estimated  life  expectancy  for  an  individual  bom  in  1850. 

Does  the  least  squares  line  give  an  accurate  estimate  for  that  year?  Explain  why  or  why  not. 
k.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.7  (Solution  on  p.  580.) 

The  percent  of  female  wage  and  salary  workers  who  are  paid  hourly  rates  is  given  below  for  the 
years  1979  - 1992.  (Source:  Bureau  of  Labor  Statistics,  U.S.  Dept.  of  Labor) 
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Year 

Percent  of  workers  paid  hourly  rates 

1979 

61.2 

1980 

60.7 

1981 

61.3 

1982 

61.3 

1983 

61.8 

1984 

61.7 

1985 

61.8 

1986 

62.0 

1987 

62.7 

1990 

62.8 

1992 

62.9 

Table  12.9 


a.  Using  "year"  as  the  independent  variable  and  "percent"  as  the  dependent  variable,  make  a 

scatter  plot  of  the  data. 

b.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

c.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  oi:  y—  a  +  hx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  Find  the  estimated  percents  for  1991  and  1988. 

f.  Use  the  two  points  in  (e)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

g.  Based  on  the  above  data,  is  there  a  linear  relationship  between  the  year  and  the  percent  of 

female  wage  and  salary  earners  who  are  paid  hourly  rates? 

h.  Are  there  any  outliers  in  the  above  data? 

i.  What  is  the  estimated  percent  for  the  year  2050?  Does  the  least  squares  line  give  an  accurate 

estimate  for  that  year?  Explain  why  or  why  not? 
j.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.8 

The  maximum  discount  value  of  the  Entertainment®  card  for  the  "Fine  Dining"  section.  Edition 
10,  for  various  pages  is  given  below. 
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Page  number 

Maximum  value  ($) 

4 

16 

14 

19 

25 

15 

32 

17 

43 

19 

57 

15 

72 

16 

85 

15 

90 

17 

Table  12.10 


a.  Decide  which  variable  shoiild  be  the  independent  variable  and  which  shoiild  be  the  dependent 

variable. 

b.  Draw  a  scatter  plot  of  the  ordered  pairs. 

A 

c.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y=  a  +  bx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  Find  the  estimated  maximum  values  for  the  restaurants  on  page  10  and  on  page  70. 

f.  Use  the  two  points  in  (e)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

g.  Does  it  appear  that  the  restaurants  giving  the  maximum  value  are  placed  in  the  beginning  of 

the  "Fine  Dining"  section?  How  did  you  arrive  at  your  answer? 

h.  Suppose  that  there  were  200  pages  of  restaurants.  What  do  you  estimate  to  be  the  maximirai 

value  for  a  restaurant  listed  on  page  200? 

i.  Is  the  least  squares  line  valid  for  page  200?  Why  or  why  not? 

j.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

The  next  two  questions  refer  to  the  following  data:  The  cost  of  a  leading  liquid  laundry  detergent  in 
different  sizes  is  given  below. 


Size  (ounces) 

Cost  ($) 

Cost  per  ounce 

16 

3.99 

32 

4.99 

64 

5.99 

200 

10.99 

Table  12.11 


Exercise  12.13.9  (Solution  on  p.  580.) 

a.  Using  "size"  as  the  independent  variable  and  "cost"  as  the  dependent  variable,  make  a  scatter 

plot. 

b.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 
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A 

c.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y—  a  +  bx 

d.  Find  the  correlation  coefficient.  Is  it  significant? 

e.  If  the  laundry  detergent  were  sold  in  a  40  ounce  size,  find  the  estimated  cost. 

f.  If  the  laundry  detergent  were  sold  in  a  90  ounce  size,  find  the  estimated  cost. 

g.  Use  the  two  points  in  (e)  and  (f)  to  plot  the  least  squares  line  on  your  graph  from  (a). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Are  there  any  outliers  in  the  above  data? 

j.  Is  the  least  squares  line  valid  for  predicting  what  a  300  ounce  size  of  the  laundry  detergent 

would  cost?  Why  or  why  not? 
k.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.10 

a.  Complete  the  above  table  for  the  cost  per  ounce  of  the  different  sizes. 

b.  Using  "Size"  as  the  independent  variable  and  "Cost  per  ounce"  as  the  dependent  variable, 

make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y—  a  +  bx 

e.  Find  the  correlation  coefficient.  Is  it  significant? 

f .  If  the  laundry  detergent  were  sold  in  a  40  ounce  size,  find  the  estimated  cost  per  ounce. 

g.  If  the  laundry  detergent  were  sold  in  a  90  ounce  size,  find  the  estimated  cost  per  ounce. 

h.  Use  the  two  points  in  (f)  and  (g)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

i.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 
j.  Are  there  any  outliers  in  the  above  data? 

k.  Is  the  least  squares  line  valid  for  predicting  what  a  300  ounce  size  of  the  laundry  detergent 

would  cost  per  ounce?  Why  or  why  not? 
1.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.11  (Solution  on  p.  580.) 

According  to  flyer  by  a  Prudential  Insurance  Company  representative,  the  costs  of  approximate 
probate  fees  and  taxes  for  selected  net  taxable  estates  are  as  follows: 


Net  Taxable  Estate  ($) 

Approximate  Probate  Fees  and  Taxes  ($) 

600,000 

30,000 

750,000 

92,500 

1,000,000 

203,000 

1,500,000 

438,000 

2,000,000 

688,000 

2,500,000 

1,037,000 

3,000,000 

1,350,000 

Table  12.12 


a.  Decide  which  variable  shoiild  be  the  independent  variable  and  which  should  be  the  dependent 

variable. 

b.  Make  a  scatter  plot  of  the  data. 
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c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y—  a  +  bx 

e.  Find  the  correlation  coefficient.  Is  it  significant? 

f.  Find  the  estimated  total  cost  for  a  net  taxable  estate  of  $1,000,000.  Find  the  cost  for  $2,500,000. 

g.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Are  there  any  outliers  in  the  above  data? 

j.  Based  on  the  above,  what  would  be  the  probate  fees  and  taxes  for  an  estate  that  does  not  have 

any  assets? 

k.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 
Exercise  12.13.12 

The  following  are  advertised  sale  prices  of  color  televisions  at  Anderson's. 


Size  (inches) 

Sale  Price  ($) 

9 

147 

20 

197 

27 

297 

31 

447 

35 

1177 

40 

2177 

60 

2497 

Table  12.13 


a.  Decide  which  variable  shoiild  be  the  independent  variable  and  which  should  be  the  dependent 

variable. 

b.  Make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  oi:  V—  a  +  hx 

e.  Find  the  correlation  coefficient.  Is  it  significant? 

f.  Find  the  estimated  sale  price  for  a  32  inch  television.  Find  the  cost  for  a  50  inch  television. 

g.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Are  there  any  outliers  in  the  above  data? 

j.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.13  (Solution  on  p.  580.) 

Below  are  the  average  heights  for  American  boys.  (Source:  Physician's  Handbook,  1990) 
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Age  (years) 

Height  (cm) 

1   *  i_1 

birth 

50.8 

2 

83.8 

3 

91.4 

5 

106.6 

7 

119.3 

10 

137.1 

14 

157.5 

Table  12.14 


a.  Decide  which  variable  should  be  the  independent  variable  and  which  should  be  the  dependent 

variable. 

b.  Make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  V—  «  +  bx 

e.  Find  the  correlation  coefficient.  Is  it  significant? 

f.  Find  the  estimated  average  height  for  a  one  year-old.  Find  the  estimated  average  height  for  an 

eleven  year-old. 

g.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Are  there  any  outliers  in  the  above  data? 

j.  Use  the  least  squares  line  to  estimate  the  average  height  for  a  sixty-two  year-old  man.  Do  you 

think  that  your  answer  is  reasonable?  Why  or  why  not? 
k.  What  is  the  slope  of  the  least  squares  (best-fit)  line?  Interpret  the  slope. 

Exercise  12.13.14 

The  following  chart  gives  the  gold  medal  times  for  every  other  Summer  Oljonpics  for  the  women's 
100  meter  freestyle  (swimming). 


Year 

Time  (seconds) 

1912 

82.2 

1924 

72.4 

1932 

66.8 

1952 

66.8 

1960 

61.2 

1968 

60.0 

1976 

55.65 

1984 

55.92 

1992 

54.64 

2000 

53.8 

2008 

53.1 
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Table  12.15 

a.  Decide  which  variable  should  be  the  independent  variable  and  which  should  be  the  dependent 

variable. 

b.  Make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  oi:  y  —  a  +  hx 

e.  Find  the  correlation  coefficient.  Is  the  decrease  in  times  significant? 

f .  Find  the  estimated  gold  medal  time  for  1932.  Find  the  estimated  time  for  1984. 

g.  Why  are  the  answers  from  (f)  different  from  the  chart  values? 

h.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

i.  Does  it  appear  that  a  Hne  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

j.  Use  the  least  squares  line  to  estimate  the  gold  medal  time  for  the  next  Summer  Olympics.  Do 
you  think  that  yovn  answer  is  reasonable?  Why  or  why  not? 

The  next  three  questions  use  the  following  state  information. 


State 

#  letters  in  name 

Year  entered  the 
Union 

Rank  for  entering 
the  Union 

Area  (square 
miles) 

Alabama 

7 

1819 

22 

52,423 

Colorado 

1876 

38 

104,100 

Hawaii 

1959 

50 

10,932 

Iowa 

1846 

29 

56,276 

Maryland 

1788 

7 

12,407 

Missouri 

1821 

24 

69,709 

New  Jersey 

1787 

3 

8,722 

Ohio 

1803 

17 

44,828 

South  Carolina 

13 

1788 

8 

32,008 

Utah 

1896 

45 

84,904 

Wisconsin 

1848 

30 

65,499 

Table  12.16 


Exercise  12.13.15  (Solution  on  p.  581.) 

We  are  interested  in  whether  or  not  the  niraiber  of  letters  in  a  state  name  depends  upon  the  year 
the  state  entered  the  Union. 

a.  Decide  which  variable  should  be  the  independent  variable  and  which  should  be  the  dependent 

variable. 

b.  Make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  oi:  y—  a  +  hx 

e.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 
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f.  Find  the  estimated  number  of  letters  (to  the  nearest  integer)  a  state  would  have  if  it  entered 

the  Union  in  1900.  Find  the  estimated  number  of  letters  a  state  would  have  if  it  entered  the 

Union  in  1940. 

g.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Use  the  least  squares  line  to  estimate  the  number  of  letters  a  new  state  that  enters  the  Union  this 

year  would  have.  Can  the  least  squares  line  be  used  to  predict  it?  Why  or  why  not? 

Exercise  12.13.16 

We  are  interested  in  whether  there  is  a  relationship  between  the  ranking  of  a  state  and  the  area  of 
the  state. 

a.  Let  rank  be  the  independent  variable  and  area  be  the  dependent  variable. 

b.  What  do  you  think  the  scatter  plot  wiU  look  like?  Make  a  scatter  plot  of  the  data. 

c.  Does  it  appear  from  inspection  that  there  is  a  relationship  between  the  variables?  Why  or  why 

not? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  i/=  a  +  bx 

e.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 

f.  Find  the  estimated  areas  for  Alabama  and  for  Colorado.  Are  they  close  to  the  actual  areas? 

g.  Use  the  two  points  in  (f)  to  plot  the  least  squares  line  on  your  graph  from  (b). 

h.  Does  it  appear  that  a  line  is  the  best  way  to  fit  the  data?  Why  or  why  not? 

i.  Are  there  any  outliers? 

j.  Use  the  least  squares  line  to  estimate  the  area  of  a  new  state  that  enters  the  Union.  Can  the  least 

squares  line  be  used  to  predict  it?  Why  or  why  not? 
k.  Delete  "Hawaii"  and  substitute  "Alaska"  for  it.  Alaska  is  the  fortieth  state  with  an  area  of 

656,424  square  miles. 
1.  Calculate  the  new  least  squares  line. 

m.  Find  the  estimated  area  for  Alabama.  Is  it  closer  to  the  actual  area  with  this  new  least  squares 

line  or  with  the  previous  one  that  included  Hawaii?  Why  do  you  think  that's  the  case? 
n.  Do  you  think  that,  in  general,  newer  states  are  larger  than  the  original  states? 

Exercise  12.13.17  (Solution  on  p.  581.) 

We  are  interested  in  whether  there  is  a  relationship  between  the  rank  of  a  state  and  the  year  it 
entered  the  Union. 

a.  Let  year  be  the  independent  variable  and  rank  be  the  dependent  variable. 

b.  What  do  you  think  the  scatter  plot  will  look  like?  Make  a  scatter  plot  of  the  data. 

c.  Why  must  the  relationship  be  positive  between  the  variables? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  •/=  a  +  bx 

e.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 

f.  Let's  say  a  fifty-first  state  entered  the  imion.  Based  upon  the  least  squares  line,  when  should 

that  have  occurred? 

g.  Using  the  least  squares  line,  how  many  states  do  we  currently  have? 

h.  Why  isn't  the  least  squares  line  a  good  estimator  for  this  year? 

Exercise  12.13.18 

Below  are  the  percents  of  the  U.S.  labor  force  (excluding  self-employed  and  unemployed  )  that 
are  members  of  a  union.  We  are  interested  in  whether  the  decrease  is  significant.  (Source:  Bureau 
of  Labor  Statistics,  U.S.  Dept.  of  Labor) 
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Year 

Percent 

1945 

35.5 

1950 

31.5 

1960 

31.4 

1970 

27.3 

1980 

21.9 

1993 

15.8 

2011 

11.8 

Table  12.17 


a.  Let  year  be  the  independent  variable  and  percent  be  the  dependent  variable. 

b.  What  do  you  think  the  scatter  plot  will  look  like?  Make  a  scatter  plot  of  the  data. 

c.  Why  will  the  relationship  between  the  variables  be  negative? 

A 

d.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y—  a  +  bx 

e.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 

f .  Based  on  your  answer  to  (e),  do  you  think  that  the  relationship  can  be  said  to  be  decreasing? 

g.  If  the  trend  continues,  when  will  there  no  longer  be  any  union  members?  Do  you  think  that 

wiU  happen? 

The  next  two  questions  refer  to  the  following  information:  The  data  below  reflects  the  1991-92  Reimion 
Class  Giving.  (Soiurce:  SUNY  Albany  alturmi  magazine) 


Class  Year 

Average  Gift 

Total  Giving 

1922 

41.67 

125 

1927 

60.75 

1,215 

1932 

83.82 

3,772 

1937 

87.84 

5,710 

1947 

88.27 

6,003 

1952 

76.14 

5,254 

1957 

52.29 

4,393 

1962 

57.80 

4,451 

1972 

42.68 

18,093 

1976 

49.39 

22,473 

1981 

46.87 

20,997 

1986 

37.03 

12,590 

Table  12.18 


Exercise  12.13.19  (Solution  on  p.  581.) 

We  will  use  the  columns  "class  year"  and  "total  giving"  for  all  questions,  ujiless  otherwise  stated. 
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a.  What  do  you  think  the  scatter  plot  will  look  like?  Make  a  scatter  plot  of  the  data. 

A 

b.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  ]/—  a  +  bx 

c.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 

d.  For  the  class  of  1930,  predict  the  total  class  gift. 

e.  For  the  class  of  1964,  predict  the  total  class  gift. 

f .  For  the  class  of  1850,  predict  the  total  class  gift.  Why  doesn't  this  value  make  any  sense? 
Exercise  12.13.20 

We  will  use  the  columns  "class  year"  and  "average  gift"  for  all  questions,  unless  otherwise  stated. 

a.  What  do  you  think  the  scatter  plot  will  look  like?  Make  a  scatter  plot  of  the  data. 

A 

b.  Calculate  the  least  squares  line.  Put  the  equation  in  the  form  of:  y  =  a  +  bx 

c.  Find  the  correlation  coefficient.  What  does  it  imply  about  the  significance  of  the  relationship? 

d.  For  the  class  of  1930,  predict  the  average  class  gift. 

e.  For  the  class  of  1964,  predict  the  average  class  gift. 

f .  For  the  class  of  2010,  predict  the  average  class  gift.  Why  doesn't  this  value  make  any  sense? 

Exercise  12.13.21  (Solution  on  p.  581.) 

We  are  interested  in  exploring  the  relationship  between  the  weight  of  a  vehicle  and  its  fuel  effi- 
ciency (gasoline  mileage).  The  data  in  the  table  show  the  weights,  in  pounds,  and  fuel  efficiency, 
measured  in  miles  per  gallon,  for  a  sample  of  12  vehicles. 


Weight 

Fuel  Efficiency 

2715 

24 

2570 

28 

2610 

29 

2750 

38 

3000 

25 

3410 

22 

3640 

20 

3700 

26 

3880 

21 

3900 

18 

4060 

18 

4710 

15 

Table  12.19 


a.  Graph  a  scatterplot  of  the  data. 

b.  Find  the  correlation  coefficient  and  determine  if  it  is  significant. 

c.  Find  the  equation  of  the  best  fit  line. 

d.  Write  the  sentence  that  interprets  the  meaning  of  the  slope  of  the  line  in  the  context  of  the  data. 

e.  What  percent  of  the  variation  in  fuel  efficiency  is  explained  by  the  variation  in  the  weight  of  the 

vehicles,  using  the  regression  line?  (State  your  answer  in  a  complete  sentence  in  the  context 
of  the  data.) 


Available  for  free  at  Connexions  <http://cnx.Org/content/coI10522/l.40> 


568 


CHAPTER  12.  LINEAR  REGRESSION  AND  CORRELATION 


f.  Accurately  graph  the  best  fit  line  on  your  scatterplot. 

g.  For  the  vehicle  that  weights  3000  pounds,  find  the  residual  (y-yhat).  Does  the  value  predicted 

by  the  line  underestimate  or  overestimate  the  observed  data  value? 

h.  Identify  any  outliers,  using  either  the  graphical  or  numerical  procedure  demonstrated  in  the 

textbook. 

i.  The  outlier  is  a  hybrid  car  that  runs  on  gasoline  and  electric  technology,  but  all  other  vehicles 

in  the  sample  have  engines  that  use  gasoline  only.  Explain  why  it  would  be  appropriate  to 
remove  the  outlier  from  the  data  in  this  situation.  Remove  the  outlier  from  the  sample  data. 
Find  the  new  correlation  coefficient,  coefficient  of  determination,  and  best  fit  line, 
j.  Compare  the  correlation  coefficients  and  coefficients  of  determination  before  and  after  removing 
the  outlier,  and  explain  in  complete  sentences  what  these  numbers  indicate  about  how  the 
model  has  changed. 

Exercise  12.13.22  (Solution  on  p.  581.) 

The  four  data  sets  below  were  created  by  statistician  Francis  Anscomb.  They  show  why  it  is  im- 
portant to  examine  the  scatterplots  for  your  data,  in  addition  to  finding  the  correlation  coefficient, 
in  order  to  evaluate  the  appropriateness  of  fitting  a  linear  model. 


Setl 

Set  2 

Set  3 

Set  4 

X 

y 

X 

y 

X 

y 

X 

y 

10 

8.04 

10 

9.14 

10 

7.46 

8 

6.58 

8 

6.95 

8 

8.14 

8 

6.77 

8 

5.76 

13 

7.58 

13 

8.74 

13 

12.74 

8 

7.71 

9 

8.81 

9 

8.77 

9 

7.11 

8 

8.84 

11 

8.33 

11 

9.26 

11 

7.81 

8 

8.47 

14 

9.96 

14 

8.10 

14 

8.84 

8 

7.04 

6 

7.24 

6 

6.13 

6 

6.08 

8 

5.25 

4 

4.26 

4 

3.10 

4 

5.39 

19 

12.50 

12 

10.84 

12 

9.13 

12 

8.15 

8 

5.56 

7 

4.82 

7 

7.26 

7 

6.42 

8 

7.91 

5 

5.68 

5 

4.74 

5 

5.73 

8 

6.89 

Table  12.20 


a.  For  each  data  set,  find  the  least  squares  regression  line  and  the  correlation  coefficient.  What  did 
you  discover  about  the  lines  and  values  of  r? 

For  each  data  set,  create  a  scatter  plot  and  graph  the  least  squares  regression  line.  Use  the  graphs 
to  answer  the  following  questions: 

b.  For  which  data  set  does  it  appear  that  a  curve  would  be  a  more  appropriate  model  than  a  line? 

c.  Which  data  set  has  an  influential  point  (point  close  to  or  on  the  line  that  greatly  influences  the 

best  fit  line)? 

d.  Which  data  set  has  an  outlier  (obviously  visible  on  the  scatter  plot  with  best  fit  line  graphed)? 

e.  Which  data  set  appears  to  be  the  most  appropriate  to  model  using  the  least  squares  regression 

line? 
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12.13.1  Try  these  multiple  choice  questions 

Exercise  12.13.23  (Solution  on  p.  582.) 

A  correlation  coefficient  of  -0.95  means  there  is  a  between  the  two  variables. 


A.  Strong  positive  correlation 

B.  Weak  negative  correlation 

C.  Strong  negative  correlation 

D.  No  Correlation 


Exercise  12.13.24  (Solution  on  p.  582.) 

According  to  the  data  reported  by  the  New  York  State  Department  of  Health  regarding  West  Nile 
Virus  (http://www.health.state.ny.us/nysdoh/westnile/update/update.htm)  for  the  years  2000- 
2008,  the  least  squares  line  equation  for  the  number  of  reported  dead  birds  (x)  versus  the  number 

A 

of  human  West  Nile  virus  cases  (y)  is  y=  —10.2638  +  0.0491x.  If  the  number  of  dead  birds  reported 
in  a  year  is  732,  how  many  hiraian  cases  of  West  Nile  virus  can  be  expected?  r  —  0.5490 

A.  No  prediction  can  be  made. 

B.  19.6 

C.  15 

D.  38.1 


The  next  three  questions  refer  to  the  following  data:  (showing  the  number  of  hiurricanes  by  category  to 
directly  strike  the  mainland  U.S.  each  decade)  obtained  from  www.Jihc.noaa.gov /gifs/tableS.gif^^  A  major 
hurricane  is  one  with  a  strength  rating  of  3, 4  or  5. 


Decade 

Total  Number  of  Hurricanes 

Number  of  Major  Hurricanes 

1941-1950 

24 

10 

1951-1960 

17 

8 

1961-1970 

14 

6 

1971-1980 

12 

4 

1981-1990 

15 

5 

1991-2000 

14 

5 

2001  -  2004 

9 

3 

Table  12.21 


Exercise  12.13.25  (Solution  on  p.  582.) 

Using  only  completed  decades  (1941  -  2000),  calculate  the  least  squares  line  for  the  number  of 
major  hurricanes  expected  based  upon  the  total  nimiber  of  hiirricanes. 

A 

A.  y=  -1.67;c  +  0.5 

A 

B.  y=  0.5x  -  1.67 

A 

C.  y=  0.94x  -  1.67 

A 

D.  y=  -2x  +  l 


'http:/ / www.nhc.noaa.gov/gifs/table6.gif 
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Exercise  12.13.26  (Solution  on  p.  582.) 

The  correlation  coefficient  is  0.942.  Is  this  considered  significant?  Why  or  why  not? 

A.  No,  because  0.942  is  greater  than  the  critical  value  of  0.707 

B.  Yes,  because  0.942  is  greater  than  the  critical  value  of  0.707 

C.  No,  because  0942  is  greater  than  the  critical  value  of  0.811 

D.  Yes,  because  0.942  is  greater  than  the  critical  value  of  0.811 

Exercise  12.13.27  (Solution  on  p.  582.) 

The  data  for  2001-2004  show  9  hurricanes  have  hit  the  mainland  United  States.  The  line  of  best  fit 
predicts  2.83  major  hiirricanes  to  hit  mainland  U.S.  Can  the  least  squares  line  be  used  to  make  this 
prediction? 

A.  No,  because  9  lies  outside  the  independent  variable  values 

B.  Yes,  because,  in  fact,  there  have  been  3  major  hurricanes  this  decade 

C.  No,  because  2.83  lies  outside  the  dependent  variable  values 

D.  Yes,  because  how  else  could  we  predict  what  is  going  to  happen  this  decade. 

Exercises  21  and  22  contributed  by  Roberta  Bloom 
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12.14  Lab  1:  Regression  (Distance  from  School)^^ 

Class  Time: 
Names: 

12.14.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  and  construct  the  line  of  best  fit  between  two  variables. 

•  The  student  will  evaluate  the  relationship  between  two  variables  to  determine  if  that  relationship  is 
significant. 

12.14.2  Collect  the  Data 

Use  8  members  of  your  class  for  the  sample.  CoUect  bivariate  data  (distance  an  individual  lives  from  school, 
the  cost  of  supplies  for  the  current  term). 

1.  Complete  the  table. 


Distance  from  school 

Cost  of  supplies  this  term 

Table  12.22 

2.  Which  variable  should  be  the  dependent  variable  and  which  should  be  the  independent  variable? 
Why? 

3.  Graph  "distance"  vs.  "cost."  Plot  the  points  on  the  graph.  Label  both  axes  with  words.  Scale  both 
axes. 

^*This  content  is  available  online  at  <http:/ / cnx.org/ content/ ml7080/l.ll/>. 
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Figure  12.18 


12.14.3  Analyze  the  Data 

Enter  your  data  into  your  calculator  or  computer.  Write  the  linear  equation  below,  rounding  to  4  decimal 
places. 

1.  Calculate  the  following: 

a.  a- 

b.  b= 

c.  correlation  = 

d.  n  = 

A 

e.  equation:  y  = 

f.  Is  the  correlation  significant?  Why  or  why  not?  (Answer  in  1-3  complete  sentences.) 

2.  Supply  an  answer  for  the  following  senarios: 

a.  For  a  person  who  lives  8  miles  from  campus,  predict  the  total  cost  of  supplies  this  term: 

b.  For  a  person  who  lives  80  miles  from  campus,  predict  the  total  cost  of  supplies  this  term: 

3.  Obtain  the  graph  on  your  calculator  or  computer.  Sketch  the  regression  line  below. 
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Figure  12.19 


12.14.4  Discussion  Questions 

1.  Answer  each  with  1-3  complete  sentences. 

a.  Does  the  line  seem  to  fit  the  data?  Why? 

b.  What  does  the  correlation  imply  about  the  relationship  between  the  distance  and  the  cost? 

2.  Are  there  any  outliers?  If  so,  which  point  is  an  outlier? 

3.  Should  the  outlier,  if  it  exists,  be  removed?  Why  or  why  not? 
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12.15  Lab  2:  Regression  (Textbook  Cost) ' 

Class  Time: 
Names: 

12.15.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  and  construct  the  line  of  best  fit  between  two  variables. 

•  The  student  will  evaluate  the  relationship  between  two  variables  to  determine  if  that  relationship  is 
significant. 

12.15.2  Collect  the  Data 

Survey  10  textbooks.  Collect  bivariate  data  (number  of  pages  in  a  textbook,  the  cost  of  the  textbook). 
1.  Complete  the  table. 


Number  of  pages 

Cost  of  textbook 

Table  12.23 


2.  Which  variable  should  be  the  dependent  variable  and  which  should  be  the  independent  variable? 

Why? 

3.  Graph  "distance"  vs.  "cost."  Plot  the  points  on  the  graph  in  "Analyze  the  Data".  Label  both  axes  with 
words.  Scale  both  axes. 

12.15.3  Analyze  the  Data 

Enter  your  data  into  your  calculator  or  computer.  Write  the  linear  equation  below,  rounding  to  4  decimal 
places. 

1.  Calculate  the  following: 

a.  fl  = 

h.  b  = 

c.  correlation  = 

d.  n  = 

^''This  content  is  available  online  at  <http://cnx.org/content/ml7087/1.9/>. 
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e.  equation:  y  = 

f.  Is  the  correlation  significant?  Why  or  why  not?  (Answer  in  1-3  complete  sentences.) 

2.  Supply  an  answer  for  the  following  senarios: 

a.  For  a  textbook  with  400  pages,  predict  the  cost: 

b.  For  a  textbook  with  600  pages,  predict  the  cost: 

3.  Obtain  the  graph  on  your  calculator  or  computer.  Sketch  the  regression  line  below. 


Figure  12.20 


12.15.4  Discussion  Questions 

1.  Answer  each  with  1-3  complete  sentences. 

a.  Does  the  line  seem  to  fit  the  data?  Why? 

b.  What  does  the  correlation  imply  about  the  relationship  between  the  number  of  pages  and  the  cost? 

2.  Are  there  any  outliers?  If  so,  which  point(s)  is  an  outlier? 

3.  Should  the  outlier,  if  it  exists,  be  removed?  Why  or  why  not? 
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12.16  Lab  3:  Regression  (Fuel  Efficiency)^^ 

Class  Time: 
Names: 

12.16.1  Student  Learning  Outcomes: 

•  The  student  will  calculate  and  construct  the  line  of  best  fit  between  two  variables. 

•  The  student  will  evaluate  the  relationship  between  two  variables  to  determine  if  that  relationship  is 
significant. 

12.16.2  Collect  the  Data 

Use  the  most  recent  April  issue  of  Consumer  Reports.  It  will  give  the  total  fuel  efficiency  (in  miles  per 
gallon)  and  weight  (in  pounds)  of  new  model  cars  with  automatic  transmissions.  We  will  use  this  data  to 
determine  the  relationship,  if  any,  between  the  fuel  efficiency  of  a  car  and  its  weight. 

1.  Which  variable  should  be  the  independent  variable  and  which  should  be  the  dependent  variable? 

Explain  your  answer  in  one  or  two  complete  sentences. 

2.  Using  your  random  number  generator,  randomly  select  20  cars  from  the  list  and  record  their  weights 
and  fuel  efficiency  into  the  table  below. 


Weight 

Fuel  Efficiency 

^^This  content  is  available  onKne  at  <http://cnx.org/content/ml7079/1.8/>. 
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Table  12.24 

3.  Which  variable  should  be  the  dependent  variable  and  which  should  be  the  independent  variable? 
Why? 

4.  By  hand,  do  a  scatterplot  of  "weight"  vs.  "fuel  efficiency".  Plot  the  points  on  graph  paper.  Label  both 
axes  with  words.  Scale  both  axes  accurately. 


Figure  12.21 


12.16.3  Analyze  the  Data 

Enter  your  data  into  your  calculator  or  computer.  Write  the  linear  equation  below,  rounding  to  4  decimal 
places. 

1.  Calculate  the  following: 

a.  fl  = 

b.  b  = 

c.  correlation  = 

d.  n  - 

A 

e.  equation:  y  - 
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2.  Obtain  the  graph  of  the  regression  line  on  your  calculator.  Sketch  the  regression  line  on  the  same  axes 
as  your  scatterplot. 


12.16.4  Discussion  Questions 

1.  Is  the  correlation  significant?  Explain  how  you  determined  this  in  complete  sentences. 

2.  Is  the  relationship  a  positive  one  or  a  negative  one?  Explain  how  you  can  tell  and  what  this  means  in 
terms  of  weight  and  fuel  efficiency. 

3.  In  one  or  two  complete  sentences,  what  is  the  practical  interpretation  of  the  slope  of  the  least  squares 
line  in  terms  of  fuel  efficiency  and  weight? 

4.  For  a  car  that  weighs  4000  pounds,  predict  its  fuel  efficiency.  Include  units. 

5.  Can  we  predict  the  fuel  efficiency  of  a  car  that  weighs  10000  pounds  using  the  least  squares  line? 
Explain  why  or  why  not. 

6.  Questions.  Answer  each  in  1  to  3  complete  sentences. 

a.  Does  the  line  seem  to  fit  the  data?  Why  or  why  not? 

b.  What  does  the  correlation  imply  about  the  relationship  between  fuel  efficiency  and  weight  of  a 

car?  Is  this  what  you  expected? 

7.  Are  there  any  outliers?  If  so,  which  point  is  an  outlier? 
**  This  lab  was  designed  and  contributed  by  Diane  Mathios. 
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Solutions  to  Exercises  in  Chapter  12 

Solution  to  Example  12.11,  Problem  2  (p.  541) 

The  X  values  in  the  data  are  between  65  and  75.  90  is  outside  of  the  domain  of  the  observed  x  values  in 
the  data  (independent  variable),  so  you  cannot  reliably  predict  the  final  exam  score  for  this  student.  (Even 
though  it  is  possible  to  enter  x  into  the  equation  and  calculate  a  y  value,  you  should  not  do  so!) 

To  really  understand  how  unreliable  the  prediction  can  be  outside  of  the  observed  x  values  in  the 
data,  make  the  substitution  x  =  90  into  the  equation. 

A 

y=  -173.51  +  4.83  (90)  =  261.19 

The  final  exam  score  is  predicted  to  be  261.19.  The  largest  the  final  exam  score  can  be  is  200. 


NOTE:  The  process  of  predicting  inside  of  the  observed  x  values  in  the  data  is  called  interpolation. 
The  process  of  predicting  outside  of  the  observed  x  values  in  the  data  is  called  extrapolation. 

Solutions  to  Practice:  Linear  Regression 

Solution  to  Exercise  12.12.2  (p.  553) 

a.  fl  =  -3,448,225 
h.  1750 

c.  corr.  =  0.4526 

d.  n  =  22 

Solution  to  Exercise  12.12.3  (p.  553) 

A 

y=  -3,448,225  +1750:!c 

Solution  to  Exercise  12.12.4  (p.  553) 

a.  25,525 

b.  34,275 

Solution  to  Exercise  12.12.10  (p.  554) 
a.  -725 


Solutions  to  Homework 
Solution  to  Exercise  12.13.1  (p.  555) 

a.  Independent:  Age;  Dependent:  Fatalities 

d.  Independent:  Power  Consumption;  Dependent:  Utility 

Solution  to  Exercise  12.13.3  (p.  556) 

A 

b.  y=  88.7206  -  0.0432:*: 

c.  -0.8533,  Yes 

g.  No 

h.  No. 

i.  2.93,  Yes 

j.  slope  =  -0.0432.  As  the  year  increases  by  one,  the  welfare  family  size  tends  to  decrease  by  0.0432  people. 
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Solution  to  Exercise  12.13.5  (p.  556) 

b.  Yes 

A 

c.  y=  102.4287  +  11.7585X 

d.  0.9436;  yes 

e.  478.70  feet;  1207.73  feet 

g.  Yes 

h.  Yes;  (57, 1050) 

i.  172.98;  No 
j.  11.7585  feet 

k.  slope  =  11.7585.  As  the  number  of  stories  increases  by  one,  the  height  of  the  building  tends  to  increase 
by  11.7585  feet. 

Solution  to  Exercise  12.13.7  (p.  558) 

b.  Yes 

A 

c.  y=  -266.8863  +  0.1656;c 

d.  0.9448;  Yes 

e.  62.8233;  62.3265 

h.  yes;  (1987,  62.7) 

i.  72.5937;  No 

j.  slope  =  0.1656.  As  the  year  increases  by  one,  the  percent  of  workers  paid  hourly  rates  tends  to  increase 
by  0.1656. 

Solution  to  Exercise  12.13.9  (p.  560) 

b.  Yes 

A 

c.  y=  3.5984  +  0.0371;c 

d.  0.9986;  Yes 

e.  $5.08 

f.  $6.93 
i.  No 

j.  Not  valid 

k.  slope  =  0.0371.  As  the  number  of  ounces  increases  by  one,  the  cost  of  liquid  detergent  tends  to  increase 
by  $0.0371  or  is  predicted  to  increase  by  $0.0371  (about  4  cents). 

Solution  to  Exercise  12.13.11  (p.  561) 

c.  Yes 

A 

d.  y=  -337, 424.6478  +  0.5463:*; 

e.  0.9964;  Yes 

f.  $208,875.35;  $1,028,325.35 

h.  Yes 

i.  No 

k.  slope  =  0.5463.  As  the  net  taxable  estate  increases  by  one  doUar,  the  approximate  probate  fees  and  taxes 
tend  to  increase  by  0.5463  dollars  (about  55  cents). 

Solution  to  Exercise  12.13.13  (p.  562) 

c.  Yes 

A 

d.  y=  65.0876  +  7.0948;c 

e.  0.9761;  yes 

f.  72.2  cm;  143.13  cm 
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h.  Yes 

i.  No 

j.  505.0  cm;  No 

k.  slope  =  7.0948.  As  the  age  of  an  American  boy  increases  by  one  year,  the  average  height  tends  to  increase 
by  7.0948  cm. 

Solution  to  Exercise  12.13.15  (p.  564) 

c.  No 

A 

d.  y=  47.03  -  0.0216X 

e.  -0.4280 

f.  6;  5 

Solution  to  Exercise  12.13.17  (p.  565) 

A 

d.  y=  -480.5845  +  0.2748x 

e.  0.9553 

f.  1934 

Solution  to  Exercise  12.13.19  (p.  566) 

A 

b.  y= -569, 770.2796  +  296.0351X 

c.  0.8302 

d.  $1577.46 

e.  $11,642.66 

f.  -$22,105.34 

Solution  to  Exercise  12.13.21  (p.  567) 

b.  r  =  -0.8,  significant 

c.  yhat  =  48.4-0.00725X 

d.  For  every  one  pound  increase  in  weight,  the  fuel  efficiency  tends  to  decrease  (or  is  predicted  to  decrease) 

by  0.00725  miles  per  gallon.  (For  every  one  thousand  pounds  increase  in  weight,  the  fuel  efficiency 
tends  to  decrease  by  7.25  miles  per  gallon.) 

e.  64%  of  the  variation  in  fuel  efficiency  is  explained  by  the  variation  in  weight  using  the  regression  line. 

g.  yhat=48.4-0.00725(3000)=26.65  mpg.  y-yhat=25-26.65=-1.65.  Because  yhat=26.5  is  greater  than  y=25,  the 

line  overestimates  the  observed  fuel  efficiency. 

h.  (2750,38)  is  the  outlier.  Be  sure  you  know  how  to  justify  it  using  the  requested  graphical  or  numerical 

methods,  not  just  by  guessing. 

i.  yhat  =  42.4-0.00578X 

j.  Without  outlier,  r=-0.885,  rsquare=0.76;  with  outlier,  r=-0.8,  rsquare=0.64.  The  new  linear  model  is  a 
better  fit,  after  the  outlier  is  removed  from  the  data,  because  the  new  correlation  coefficient  is  farther 
from  0  and  the  new  coefficient  of  determination  is  larger. 

Solution  to  Exercise  12.13.22  (p.  568) 

a.  AH  foiir  data  sets  have  the  same  correlation  coefficient  r=0.816  and  the  same  least  squares  regression  line 
yhat=3+0.5x 

b.  Set  2 ;  c.  Set  4  ;  d.  Set  3  ;  e.  Set  1 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


Figure  12.22 


Solution  to  Exercise  12.13.23  (p.  569) 

C 

Solution  to  Exercise  12.13.24  (p.  569) 

A 

Solution  to  Exercise  12.13.25  (p.  569) 

B 

Solution  to  Exercise  12.13.26  (p.  570) 

D 

Solution  to  Exercise  12.13.27  (p.  570) 

A 
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Chapter  13 

F  Distribution  and  One-Way  ANOVA 


13.1  F  Distribution  and  One-Way  ANOVA' 

13.1.1  Student  Learning  Outcomes 

By  the  end  of  this  chapter,  the  student  should  be  able  to: 

•  Interpret  the  F  probability  distribution  as  the  niimber  of  groups  and  the  sample  size  change. 

•  Discuss  two  uses  for  the  F  distribution:  One-Way  ANOVA  and  the  test  of  two  variances. 

•  Conduct  and  interpret  One-Way  ANOVA. 

•  Conduct  and  interpret  h5^othesis  tests  of  two  variances. 


13.1.2  Introduction 

Many  statistical  applications  in  psychology,  social  science,  business  administration,  and  the  natural  sciences 
involve  several  groups.  For  example,  an  environmentalist  is  interested  in  knowing  if  the  average  amount  of 
pollution  varies  in  several  bodies  of  water.  A  sociologist  is  interested  in  knowing  if  the  amount  of  income  a 
person  earns  varies  according  to  his  or  her  upbringing.  A  consumer  looking  for  a  new  car  might  compare 
the  average  gas  mileage  of  several  models. 

For  hypothesis  tests  involving  more  than  two  averages,  statisticians  have  developed  a  method  called  Anal- 
ysis of  Variance"  (abbreviated  ANOVA).  In  this  chapter,  you  will  study  the  simplest  form  of  ANOVA  called 
single  factor  or  One-Way  ANOVA.  You  will  also  study  the  F  distribution,  used  for  One-Way  ANOVA,  and 
the  test  of  two  variances.  This  is  just  a  very  brief  overview  of  One-Way  ANOVA.  You  will  study  this  topic 
in  much  greater  detail  in  future  statistics  coiirses. 

•  One-Way  ANOVA,  as  it  is  presented  here,  relies  heavily  on  a  calculator  or  computer. 

•  For  further  information  about  One-Way  ANOVA,  use  the  online  link  ANOVA^  .  Use  the  back  button 
to  return  here.  (The  url  is  http://en.wikipedia.org/wiki/Analysis_of_variance.) 


^This  content  is  available  online  at  <http://cnx.org/content/ml7065/l.ll/>. 
^http:/ /en.  wikipedia.org/wiki/Analysis_of_variance 
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13.2  One-Way  ANOVA' 

13.2.1  F  Distribution  and  One-Way  ANOVA:  Purpose  and  Basic  Assumptions  of  One- 
Way  ANOVA 

The  purpose  of  a  One-Way  ANOVA  test  is  to  determine  the  existence  of  a  statistically  significant  difference 
among  several  group  means.  The  test  actually  uses  variances  to  help  determine  if  the  means  are  equal  or 
not. 

In  order  to  perform  a  One-Way  ANOVA  test,  there  are  five  basic  assumptions  to  be  fulfilled: 

•  Each  population  from  which  a  sample  is  taken  is  assumed  to  be  normal. 

•  Each  sample  is  randomly  selected  and  independent. 

•  The  populations  are  assumed  to  have  equal  standard  deviations  (or  variances). 

•  The  factor  is  the  categorical  variable. 

•  The  response  is  the  numerical  variable. 

13.2.2  The  Null  and  Alternate  Hypotheses 

The  niill  hypothesis  is  simply  that  all  the  group  population  means  are  the  same.  The  alternate  hypothesis 
is  that  at  least  one  pair  of  means  is  different.  For  example,  if  there  are  k  groups: 

Ho  :  Hi  ^  m  ^  m  ^  ...  = 

Ha  :  At  least  two  of  the  group  means  ^i,  ^2/  ^3'  — '         riot  equal. 

The  graphs  help  in  the  understanding  of  the  hypothesis  test.  In  the  first  graph  (red  box  plots). 
Ho  :  }ii  —  }i2  —  and  the  three  populations  have  the  same  distribution  if  the  null  hypothesis  is  true.  The 
variance  of  the  combined  data  is  approximately  the  same  as  the  variance  of  each  of  the  popiilations. 

If  the  null  h5^othesis  is  false,  then  the  variance  of  the  combined  data  is  larger  which  is  caused  by 
the  different  means  as  shown  in  the  second  graph  (green  box  plots). 

^This  content  is  available  online  at  <http:/ / caix.org/ content/ ml7068/1.10/>. 
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Ho  is  true.  AU  means  are  the  same. 


Ho  is  false.  All  means  are  not  the  same. 


13.3  The  F  Distribution  and  the  F  Ratio' 

The  distribution  used  for  the  hjrpothesis  test  is  a  new  one.  It  is  called  the  F  distribution,  named  after  Sir 
Ronald  Fisher,  an  English  statistician.  The  F  statistic  is  a  ratio  (a  fraction).  There  are  two  sets  of  degrees  of 
freedom;  one  for  the  numerator  and  one  for  the  denominator. 


*This  content  is  available  online  at  <http://cnx.Org/content/ml7076/l.14/>. 
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CHAPTER  13.  F  DISTRIBUTION  AND  ONE-WAY  ANOVA 


For  example,  if  F  follows  an  F  distribution  and  the  degrees  of  freedom  for  the  numerator  are  4  and  the 
degrees  of  freedom  for  the  denominator  are  10,  then  F  ~  F4  iq. 

NOTE:  The  F  distribution  is  derived  from  the  Student 's-t  distribution.  One-Way  ANOVA  expands 
the  t-test  for  comparing  more  than  two  groups.  The  scope  of  that  derivation  is  beyond  the  level  of 
this  coiurse. 

To  calculate  the  F  ratio,  two  estimates  of  the  variance  are  made. 

1.  Variance  between  samples:  An  estimate  of  cr^  that  is  the  variance  of  the  sample  means  multiplied  by 
n  (when  there  is  equal  n).  If  the  samples  are  different  sizes,  the  variance  between  samples  is  weighted 
to  account  for  the  different  sample  sizes.  The  variance  is  also  called  variation  due  to  treatment  or 
explained  variation. 

2.  Variance  within  samples:  An  estimate  of  cr^  that  is  the  average  of  the  sample  variances  (also  known 
as  a  pooled  variance).  When  the  sample  sizes  are  different,  the  variance  within  samples  is  weighted. 
The  variance  is  also  called  the  variation  due  to  error  or  unexplained  variation. 


•  SSbetween  —  the  sum  of  Squares  that  represents  the  variation  among  the  different  samples. 

•  SSwithin  —  the  sum  of  squares  that  represents  the  variation  within  samples  that  is  due  to  chance. 

To  find  a  "sum  of  squares"  means  to  add  together  squared  quantities  which,  in  some  cases,  may  be  weighted. 
We  used  sum  of  squares  to  calculate  the  sample  variance  and  the  sample  standard  deviation  in  Descriptive 
Statistics. 

MS  means  "mean  square."  MSbetween  is  the  variance  between  groups  and  MS^ithin  is  the  variance  within 
groups. 

Calculation  of  Sum  of  Squares  and  Mean  Square 

•  k  =  the  number  of  different  groups 

•  Hj  =  the  size  of  the  jth  group 

•  Sj=  the  sum  of  the  values  in  the  jth  group 

•  n  =  total  number  of  all  the  values  combined,  (total  sample  size:  J^rij) 

•  X  =  one  value:  J^x  —  J^Sj 

•  Sum  of  squares  of  all  values  from  every  group  combined:  X'^ 

•  Between  group  variability:  SStotai  —  J^x^  — 

•  Total  sum  of  squares:  J^x^  —  ^^^^ 

•  Explained  variation-  svm  of  squares  representing  variation  among  the  different  samples  SSbetween  — 


•  Unexplained  variation-  sum  of  squares  representing  variation  within  samples  due  to  chance: 

SSwithin  —  SSfotal  ~  SSbetween 

•  df's  for  different  groups  (df's  for  the  numerator):  dfbetween  ~  k—1 

•  Equation  for  errors  within  samples  (df's  for  the  denominator):  df^ithin  =  n  —  k 

•  Mean  square  (variance  estimate)  explained  by  the  different  groups:  MSbetween  =  dfj^^^'"'"'" 

•  Mean  square  (variance  estimate)  that  is  due  to  chance  (imexplained):  MS^^itbin  —  df"''^ 

MSbetween       ^^wiihm  Can  be  Written  as  follows: 

•  A/f  ^1  —  ^  ^between  —  ^  ^between 
^W^between  -  df^,,^,,,  "  fc-1 

A     A/f  C    .  ,  .    ^ '^within           ^ ^within 

^Wiwithm  -  df^.^  -  n-k 
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The  One- Way  ANOVA  test  depends  on  the  fact  that  MSbetween  can  be  influenced  by  population  differences 
among  means  of  the  several  groups.  Since  MSwithin  compares  values  of  each  group  to  its  own  group  mean, 
the  fact  that  group  means  might  be  different  does  not  affect  MS^itjun- 

The  null  h5rpothesis  says  that  all  groups  are  samples  from  populations  having  the  same  normal  distribution. 
The  alternate  hypothesis  says  that  at  least  two  of  the  sample  groups  come  from  populations  with  different 
normal  distributions.  If  the  null  h5^othesis  is  true,  MSbetween  ^rid  MS^vithin  should  both  estimate  the  same 
value. 

NOTE:  The  null  h5^othesis  says  that  all  the  group  population  means  are  equal.  The  hj^othesis  of 
equal  means  implies  that  the  populations  have  the  same  normal  distribution  because  it  is  assumed 
that  the  populations  are  normal  and  that  they  have  equal  variances. 

F-Ratio  or  F  Statistic 

p       ^^between  -^^ 

If  MSbetween  arid  MS^vithin  estimate  the  same  value  (following  the  belief  that  Ho  is  true),  then  the  F-ratio 
should  be  approximately  equal  to  1.  Mostly  just  sampling  errors  would  contribute  to  variations  away 
from  1.  As  it  turns  out,  MSbetween  consists  of  the  population  variance  plus  a  variance  produced  from  the 
differences  between  the  samples.  MS^ithin  is  ari  estimate  of  the  population  variance.  Since  variances  are 
always  positive,  if  the  null  hypothesis  is  false,  MSbetween  will  generally  be  larger  than  MS  within-  Then  the 
F-ratio  will  be  larger  than  1.  However,  if  the  population  effect  size  is  small  it  is  not  unlikely  that  MS^ithm 
will  be  larger  in  a  give  sample. 

The  above  calciilations  were  done  with  groups  of  different  sizes.  If  the  groups  are  the  same  size,  the  calcu- 
lations simplify  somewhat  and  the  F  ratio  can  be  written  as: 

F-Ratio  Formula  when  the  groups  are  the  same  size 

F  =  4^  (13-2) 

^  pooled 


where  ... 

•  n  —ihe  sample  size 

•  dfnumerator  —  k  1 

•  df,jenominator      ^  ^ 

•  ^^pooled  —  thfi  mean  of  the  sample  variances  (pooled  variance) 

•  —  the  variance  of  the  sample  means 

The  data  is  t5^ically  put  into  a  table  for  easy  viewing.  One- Way  ANOVA  results  are  often  displayed  in  this 
manner  by  computer  software. 


Source  of 
Variation 

Sum  of  Squares 
(SS) 

Degrees  of 
Freedom  (df) 

Mean  Square 
(MS) 

F 

continued  on  next  page 
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CHAPTER  13.  F  DISTRIBUTION  AND  ONE-WAY  ANOVA 


Factor 
(Between) 


SS(Factor) 


k-1 


MS(Factor)  = 
SS(Factor)/(k-l) 


MS(Factor)/MS(Err3r) 


Error 
(Within) 


SS(Error) 


n-k 


MS(Error)  = 
SS(Error)/(n-k) 


Total 


SS(Total) 


n-1 


Table  13.1 

Example  13.1 

Three  different  diet  plans  are  to  be  tested  for  mean  weight  loss.  The  entries  in  the  table  are  the 
weight  losses  for  the  different  plans.  The  Qne-Way  ANOVA  table  is  shown  below. 


Planl 

Plan  2 

Plans 

5 

3.5 

8 

4.5 

7 

4 

4 

3.5 

3 

4.5 

Table  13.2 


One-Way  ANOVA  Table:  The  formulas  for  SS(Total),  SS(Factor)  =  SS(Between)  and  SS(Error)  = 
SS(Within)  are  shown  above.  This  same  information  is  provided  by  the  TI  calculator  h5^othesis 
test  function  ANOVA  in  STAT  TESTS  (syntax  is  ANOVA(Ll,  L2,  L3)  where  LI,  L2,  L3  have  the 
data  from  Plan  1,  Plan  2,  Plan  3  respectively). 


Source  of 
Variation 

Factor 
(Between) 


Sum  of  Squares 
(SS) 

SS(Factor) 
=  SS(Between) 
=2.2458 


Degrees  of 
Freedom  (df) 

k-1 
=  3  groups  -  1 
=  2 


Mean  Square 
(MS) 

MS(Factor) 
:  SS(Factor)/(k-l) 
=  2.2458/2 
=  1.1229 


MS(Factor)/MS(Err|ar) 
=  1.1229/2.9792 
=  0.3769 


Error 
(Within) 


SS(Error) 
=  SS(Within) 
=  20.8542 


n-k 
10  total  data  -  3 
groups 
=  7 


MS(Error) 
:  SS(Error)/(n-k) 
=  20.8542/7 
=  2.9792 


Total 


SS(Total) 
2.9792  +  20.8542 
=23.1 


n-1 
10  total  data  - 1 
=  9 


Table  13.3 


The  One-Way  ANOVA  hypothesis  test  is  always  right-tailed  because  larger  F-values  are  way  out  in  the 
right  tail  of  the  F-distribution  curve  and  tend  to  make  us  reject  Hg. 
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13.3.1  Notation 

The  notation  for  the  F  distribution  is  F  ~  Fdf(num),df(denom) 
where  df(num)  =  d/between        df(denom)  =  d/^jji^j^^ 
The  mean  for  the  F  distribution  is  ti  =  ^/("""') 

'        at  aenom)—! 


13  A  Facts  About  the  F  Distribution' 

1.  The  curve  is  not  symmetrical  but  skewed  to  the  right. 

2.  There  is  a  different  curve  for  each  set  of  dfs. 

3.  The  F  statistic  is  greater  than  or  equal  to  zero. 

4.  As  the  degrees  of  freedom  for  the  numerator  and  for  the  denominator  get  larger,  the  curve  approxi- 
mates the  normal. 

5.  Other  uses  for  the  F  distribution  include  comparing  two  variances  and  Two-Way  Analysis  of  Variance. 
Comparing  two  variances  is  discussed  at  the  end  of  the  chapter.  Two-Way  Analysis  is  mentioned  for 
your  information  only. 


Fl0,25  ^40,40 

(a)  (b) 
Figure  13.1 


Example  13.2 

One-Way  ANOVA:  Four  sororities  took  a  random  sample  of  sisters  regarding  their  grade  means 
for  the  past  term.  The  results  are  shown  below: 


^This  content  is  available  online  at  <http://cnx.Org/content/ml7062/l.14/>. 
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MEAN  GRADES  FOR  FOUR  SORORITIES 

Sorority  1 

Sorority  2 

Sorority  3 

Sorority  4 

2.17 

2.63 

2.63 

3.79 

1.85 

1.77 

3.78 

3.45 

2.83 

3.25 

4.00 

3.08 

1.69 

1.86 

2.55 

2.26 

3.33 

2.21 

2.45 

3.18 

Table  13.4 

Problem 

Using  a  significance  level  of  1%,  is  there  a  difference  in  mean  grades  among  the  sororities? 
Solution 

Let  Hi,  }i2i  1^3'  Hi  be  the  population  means  of  the  sororities.  Remember  that  the  null  h5^othesis 
claims  that  the  sorority  groups  are  from  the  same  normal  distribution.  The  alternate  h5rpothesis 
says  that  at  least  two  of  the  sorority  groups  come  from  populations  with  different  normal  distri- 
butions. Notice  that  the  four  sample  sizes  are  each  size  5. 

NOTE:  This  is  an  example  of  a  balanced  design,  since  each  factor  (i.e.  Sorority)  has  the  same 
niraiber  of  observations. 

Ho  :  }ii  —  ^2  —     —  f<4 

Ha.  Not  all  of  the  means  pi,  ;i2/  ^3/  Hi  are  equal. 

Distribution  for  the  test:  F3 

where  k  =  4  groups  and  n  =  20  samples  in  total 

df  {num.)  =  fc-l  =  4-  l  =  3 

df  (denom)  —  n  —  k  —  lQ  —  ^  —  16 

Calculate  the  test  statistic:  F  =  2.23 

Graph: 
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0  2,23 


Figure  13.2 


Probability  statement:  p-value  =  P  (F  >  2.23)  =  0.1241 

Compare  a  and  the  p  —  value:  oc  =  0.01  p-value  =  0.1241  a  <  p-value 

Make  a  decision:  Since  a  <  p-value,  you  cannot  reject  Hg. 

Conclusion:  There  is  not  sufficient  evidence  to  conclude  that  there  is  a  difference  among  the  mean 
grades  for  the  sororities. 

TI-83+  or  TI  84:  Put  the  data  into  lists  LI,  L2,  L3,  and  L4.  Press  STAT  and  arrow  over  to  TESTS. 
Arrow  down  to  F:ANOVA.  Press  ENTER  and  Enter  (L1,L2,L3,L4).  The  F  statistic  is  2.2303  and  the 
p-value  is  0.1241.  df(numerator)  =  3  (under  "Factor")  and  df(denominator)  =  16  (under  Error). 


Example  13.3 

A  fourth  grade  class  is  studying  the  environment.  One  of  the  assignments  is  to  grow  bean  plants 
in  different  soils.  Tommy  chose  to  grow  his  bean  plants  in  soil  found  outside  his  classroom  mixed 
with  dryer  lint.  Tara  chose  to  grow  her  bean  plants  in  potting  soil  bought  at  the  local  nursery. 
Nick  chose  to  grow  his  bean  plants  in  soil  from  his  mother's  garden.  No  chemicals  were  used 
on  the  plants,  only  water.  They  were  grown  inside  the  classroom  next  to  a  large  window.  Each 
child  grew  5  plants.  At  the  end  of  the  growing  period,  each  plant  was  measured,  producing  the 
following  data  (in  inches): 


Tommy's  Plants 

Tara's  Plants 

Nick's  Plants 

24 

25 

23 

21 

31 

27 

23 

23 

22 

30 

20 

30 

23 

28 

20 

Table  13.5 
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Problem  1 

Does  it  appear  that  the  three  media  in  which  the  bean  plants  were  grown  produce  the  same  mean 
height?  Test  at  a  3%  level  of  significance. 

Solution 

This  time,  we  will  perform  the  calculations  that  lead  to  the  F'  statistic.  Notice  that  each  group  has 

2 

the  same  number  of  plants  so  we  will  use  the  formula  F'  =  ^ — ^ — . 

^  pooled 

First,  calculate  the  sample  mean  and  sample  variance  of  each  group. 


Tommy's  Plants 

Tara's  Plants 

Nick's  Plants 

Sample  Mean 

24.2 

25.4 

24.4 

Sample  Variance 

11.7 

18.3 

16.3 

Table  13.6 


Next,  calculate  the  variance  of  the  three  group  means  (Calculate  the  variance  of  24.2,  25.4,  and 
24.4).  Variance  of  the  group  means  =  0.413  =  Sy^ 

Then  MS^etween  —  ^^x^  —  {^)  (0.413)  where  n  =  5  is  the  sample  size  (number  of  plants  each  child 
grew). 

Calculate  the  mean  of  the  three  sample  variances  (Calculate  the  mean  of  11.7, 18.3,  and  16.3).  Mean 
of  the  sample  variances  =  15.433  —  s^pooied 

Then  MSwithin  =  s^pooied  =  15.433. 

The  F  statistic  (or  F  ratio)  is  F  =  ^J^emeen  ^  JLfiL  ^  =  0.134 

The  dfs  for  the  niraierator  =  the  number  of  groups  —  1  =  3  —  1  =  2 

The  dfs  for  the  denominator  =  the  total  number  of  samples  —  the  niraiber  of  groups  =  15  —  3  =  12 

The  distribution  for  the  test  is  ¥2,12  and  the  F  statistic  is  F  =  0.134 
The  p-value  is  P  (F  >  0.134)  =  0.8759. 

Decision:  Since  a  —  0.03  and  the  p-value  —  0.8759,  do  not  reject  Hq.  (Why?) 

Conclusion:  With  a  3%  the  level  of  significance,  from  the  sample  data,  the  evidence  is  not  sufficient 
to  conclude  that  the  mean  heights  of  the  bean  plants  are  different. 

(This  experiment  was  actually  done  by  three  classmates  of  the  son  of  one  of  the  authors.) 

Another  fourth  grader  also  grew  bean  plants  but  this  time  in  a  jelly-like  mass.  The  heights  were 
(in  inches)  24, 28, 25, 30,  and  32. 

Problem  2  (Solution  on  p.  610.) 

Do  a  One-Way  ANOVA  test  on  the  4  groups.  You  may  use  your  calculator  or  computer  to 
perform  the  test.  Are  the  heights  of  the  bean  plants  different?  Use  a  solution  sheet  (Section  14.5.4). 
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13.4.1  Optional  Classroom  Activity 

From  the  class,  create  four  groups  of  the  same  size  as  follows:  men  under  22,  men  at  least  22,  women  under 
22,  women  at  least  22.  Have  each  member  of  each  group  record  the  number  of  states  in  the  United  States 
he  or  she  has  visited.  Run  an  ANOVA  test  to  determine  if  the  average  number  of  states  visited  in  the  four 
groups  are  the  same.  Test  at  a  1%  level  of  significance.  Use  one  of  the  solution  sheets  (Section  14.5.4)  at  the 
end  of  the  chapter  (after  the  homework). 


13.5  Test  of  Two  Variances^ 

Another  of  the  uses  of  the  F  distribution  is  testing  two  variances.  It  is  often  desirable  to  compare  two 
variances  rather  than  two  averages.  For  instance,  college  administrators  woiild  like  two  college  professors 
grading  exams  to  have  the  same  variation  in  their  grading.  In  order  for  a  lid  to  fit  a  container,  the  variation 
in  the  lid  and  the  container  should  be  the  same.  A  supermarket  might  be  interested  in  the  variability  of 
check-out  times  for  two  checkers. 

In  order  to  perform  a  F  test  of  two  variances,  it  is  important  that  the  following  are  true: 

1.  The  populations  from  which  the  two  samples  are  drawn  are  normally  distributed. 

2.  The  two  populations  are  independent  of  each  other. 

Suppose  we  sample  randomly  from  two  independent  normal  populations.  Let  of  and  cr|  be  the  population 

variances  and  and  be  the  sample  variances.  Let  the  sample  sizes  be  ni  and  nz-  Since  we  are  interested 
in  comparing  the  two  sample  variances,  we  use  the  F  ratio 

hi 
{-if 


{^2f 


{-if 

F  has  the  distribution  F  ~  F  (n^  —  1,  n2  —  1) 

where  n\  —  l  are  the  degrees  of  freedom  for  the  numerator  and  n2  —  1  are  the  degrees  of  freedom  for  the 
denominator. 


If  the  null  hypothesis  is  of  —  (r|,  then  the  F-Ratio  becomes  F  — 


Lnf 


{-1 


{-if 


{sif 
(sif 


NOTE:  The  F  ratio  coiild  also  be 

(si) 


It  depends  on  Ha  and  on  which  sample  variance  is  larger. 


is  close  to  1.  But 


If  the  two  populations  have  equal  variances,  then    and  S2  are  close  in  value  and  F 
if  the  two  popiilation  variances  are  very  different,     and     tend  to  be  very  different,  too.  Choosing  sf  as 


(sif 


the  larger  sample  variance  causes  the  ratio       to  be  greater  than  1.  If  sf  and    are  far  apart,  then  F  — 
is  a  large  niraiber. 

Therefore,  if  F  is  close  to  1,  the  evidence  favors  the  null  h5^othesis  (the  two  population  variances  are  equal). 
But  if  F  is  much  larger  than  1,  then  the  evidence  is  against  the  null  h5^othesis. 


{Slf 


*This  content  is  available  onUne  at  <http://caTx.org/content/ml7075/1.8/>. 
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A  test  of  two  variances  may  be  left,  right,  or  two-tailed. 
Example  13.4 

Two  college  instructors  are  interested  in  whether  or  not  there  is  any  variation  in  the  way  they 
grade  math  exams.  They  each  grade  the  same  set  of  30  exams.  The  first  instructor's  grades  have  a 
variance  of  52.3.  The  second  instructor's  grades  have  a  variance  of  89.9. 


Test  the  claim  that  the  first  instructor's  variance  is  smaller.  (In  most  colleges,  it  is  desirable  for 
the  variances  of  exam  grades  to  be  nearly  the  same  among  instructors.)  The  level  of  significance  is 
10%. 

Solution 

Let  1  and  2  be  the  subscripts  that  indicate  the  first  and  second  instructor,  respectively. 
"1  =  "2  =  30. 

Ho:  cr^  =  c^l  and  Ha:  cr^<cr| 

Calculate  the  test  statistic:  By  the  null  hypothesis  {cr^  =  crl),  the  F  statistic  is 


Problem 


F  = 


_  (£l)!  _  523  _  n  roio 
~  f^n-i^  ~  89.9  —  <J-30iO 


Distribution  for  the  test:  F29,29 


where  ni  —  1  =  29  and  n2  —  i  =  29. 


Graph: 


This  test  is  left  tailed. 


Draw  the  graph  labeling  and  shading  appropriately. 


p  vaLue  =  0.0753 


F 


0.5818 


Figure  13.3 


Probability  statement:  p-value  =  P  (F  <  0.5818)  =  0.0753 


Compare  a  and  the  p-value:  a  =  0.10 


a  >  p-value. 


Make  a  decision:  Since  a  >  p-value,  reject  Hg. 
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Conclusion:  With  a  10%  level  of  significance,  from  the  data,  there  is  sufficient  evidence  to  conclude 
that  the  variance  in  grades  for  the  first  instructor  is  smaller. 

TI-83+  and  TI-84:  Press  STAT  and  arrow  over  to  TESTS.  Arrow  down  to  D:2-SampFTest.  Press 
ENTER.  Arrow  to  Stats  and  press  ENTER.  For  Sxl,  nl,  Sx2,  and  n2,  enter  (52.3),  30,  ^(89.9),  and 
30.  Press  ENTER  after  each.  Arrow  to  a\  :  and  <(fl.  Press  ENTER.  Arrow  down  to  Calculate  and 
press  ENTER.  F  =  0.5818  and  p-value  =  0.0753.  Do  the  procedure  again  and  try  Draw  instead  of 
Calculate. 
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13.6  Summary^ 

•  A  One-Way  ANOVA  h3^othesis  test  determines  if  several  population  means  are  equal.  The  distribu- 
tion for  the  test  is  the  F  distribution  with  2  different  degrees  of  freedom. 

Assumptions: 

a.  Each  population  from  which  a  sample  is  taken  is  assumed  to  be  normal. 

b.  Each  sample  is  randomly  selected  and  independent. 

c.  The  populations  are  assumed  to  have  equal  standard  deviations  (or  variances) 

•  A  Test  of  Two  Variances  h5^othesis  test  determines  if  two  variances  are  the  same.  The  distribution 
for  the  hypothesis  test  is  the  F  distribution  with  2  different  degrees  of  freedom. 

Assumptions: 

a.  The  populations  from  which  the  two  samples  are  drawn  are  normally  distributed. 

b.  The  two  populations  are  independent  of  each  other. 


This  content  is  available  online  at  <http://cnx.org/content/ml7072/1.4/>. 
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13.7  Practice:  One-Way  ANOVA' 

13.7.1  Student  Learning  Outcome 

•  The  student  will  conduct  a  One-Way  ANOVA  hypothesis  test. 

13.7.2  Given 

Suppose  a  group  is  interested  in  determining  whether  teenagers  obtain  their  drivers  licenses  at  approxi- 
mately the  same  average  age  across  the  country.  Suppose  that  the  following  data  are  randomly  collected 
from  five  teenagers  in  each  region  of  the  country.  The  numbers  represent  the  age  at  which  teenagers  ob- 
tained their  drivers  licenses. 


Northeast 

South 

West 

Central 

East 

16.3 

16.9 

16.4 

16.2 

17.1 

16.1 

16.5 

16.5 

16.6 

17.2 

16.4 

16.4 

16.6 

16.5 

16.6 

16.5 

16.2 

16.1 

16.4 

16.8 

X  — 

= 

Table  13.7 


13.7.3  Hypothesis 

Exercise  13.7.1 

State  the  hj^otheses. 

Ho-. 

Ha: 

13.7 A  Data  Entry 

Enter  the  data  into  your  calculator  or  computer 
Exercise  13.7.2 

degrees  of  freedom  -  numerator:  df  (n)  — 
Exercise  13.7.3 

degrees  of  freedom  -  denominator:  df  (d)  - 

Exercise  13.7.4 
F  test  statistic  = 

Exercise  13.7.5 

p-value  = 


(Solution  on  p.  610.) 
(Solution  on  p.  610.) 
(Solution  on  p.  610.) 
(Solution  on  p.  610.) 


'This  content  is  available  online  at  <http://cnx.org/content/ml7067/1.10/>. 
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13.7.5  Decisions  and  Conclusions 

State  the  decisions  and  conclusions  (in  complete  sentences)  for  the  following  preconceived  levels  of  a  . 

Exercise  13.7. b 
a  =  0.05 

Decision: 

Conclusion: 

Exercise  13.7.7 

a.  =  0.01 

Decision: 
Conclusion: 
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13.8  Homework' 

Directions:  Use  a  solution  sheet  to  conduct  the  following  hypothesis  tests.  The  solution  sheet 
can  be  found  in  the  Table  of  Contents  14.  Appendix. 

Exercise  13.8.1  (Solution  on  p.  610.) 

Three  students,  Linda,  Tuan,  and  Javier,  are  given  5  laboratory  rats  each  for  a  nutritional  experi- 
ment. Each  rat's  weight  is  recorded  in  grams.  Linda  feeds  her  rats  Formula  A,  Tuan  feeds  his  rats 
Formula  B,  and  Javier  feeds  his  rats  Formula  C.  At  the  end  of  a  specified  time  period,  each  rat  is 
weighed  again  and  the  net  gain  in  grams  is  recorded.  Using  a  significance  level  of  10%,  test  the 
hypothesis  that  the  three  formulas  produce  the  same  mean  weight  gain. 

Weights  of  Student  Lab  Rats 


Linda's  rats 

Tuan's  rats 

Javier's  rats 

43.5 

47.0 

51.2 

39.4 

40.5 

40.9 

41.3 

38.9 

37.9 

46.0 

46.3 

45.0 

38.2 

44.2 

48.6 

Table  13.8 


Exercise  13.8.2 

A  grassroots  group  opposed  to  a  proposed  increase  in  the  gas  tax  claimed  that  the  increase 
would  hiirt  working-class  people  the  most,  since  they  commute  the  farthest  to  work.  Suppose 
that  the  group  randomly  surveyed  24  individuals  and  asked  them  their  daily  one-way  commut- 
ing mileage.  The  results  are  below.  Using  a  5%  significance  level,  test  the  hypothesis  that  the  3 
mean  commuting  mileages  are  the  same. 


working-class 

professional  (middle  incomes) 

professional  (wealthy) 

17.8 

16.5 

8.5 

26.7 

17.4 

6.3 

49.4 

22.0 

4.6 

9.4 

7.4 

12.6 

65.4 

9.4 

11.0 

47.1 

2.1 

28.6 

19.5 

6.4 

15.4 

51.2 

13.9 

9.3 

Table  13.9 


Exercise  13.8.3  (Solution  on  p.  610.) 

Refer  to  Exercise  13.8.1.  Determine  whether  or  not  the  variance  in  weight  gain  is  statistically  the 
same  among  Javier's  and  Linda's  rats. 

'This  content  is  available  online  at  <http: / /cnx.org/content/ml7063/1.15/>. 
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Exercise  13.8.4 

Refer  to  Exercise  13.8.2  above  (Exercise  13.8.2).  Determine  whether  or  not  the  variance  in  rmleage 
driven  is  statistically  the  same  among  the  working  class  and  professional  (middle  income)  groups. 

For  the  next  two  problems,  refer  to  the  data  from  Terri  Vogel's  Log  Book. 

http://cnx.org/content/ml7132/latest/?collection=coll0522/latest/^° 

Exercise  13.8.5  (Solution  on  p.  610.) 

Examine  the  7  practice  laps.  Determine  whether  the  mean  lap  time  is  statistically  the  same  for  the 
7  practice  laps,  or  if  there  is  at  least  one  lap  that  has  a  different  mean  time  from  the  others. 

Exercise  13.8.6 

Examine  practice  laps  3  and  4.  Determine  whether  or  not  the  variance  in  lap  time  is  statistically 

the  same  for  those  practice  laps. 

For  the  next  four  problems,  refer  to  the  following  data. 

The  following  table  lists  the  number  of  pages  in  four  different  tj^es  of  magazines. 


home  decorating 

news 

health 

computer 

172 

87 

82 

104 

286 

94 

153 

136 

163 

123 

87 

98 

205 

106 

103 

207 

197 

101 

96 

146 

Table  13.10 


Exercise  13.8.7  (Solution  on  p.  610.) 

Using  a  significance  level  of  5%,  test  the  h5^othesis  that  the  four  magazine  types  have  the  same 
mean  lengtii.. 

Exercise  13.8.8 

Eliminate  one  magazine  type  that  you  now  feel  has  a  mean  length  different  than  the  others.  Redo 
the  hypothesis  test,  testing  that  the  remaining  three  means  are  statistically  the  same.  Use  a  new 
solution  sheet.  Based  on  this  test,  are  the  mean  lengths  for  the  remaining  three  magazines  statisti- 
cally the  same? 

Exercise  13.8.9 

Which  two  magazine  t5^es  do  you  think  have  the  same  variance  in  length? 
Exercise  13.8.10 

Which  two  magazine  types  do  you  think  have  different  variances  in  length? 

Exercise  13.8.11  (Solution  on  p.  611.) 

A  researcher  wants  to  know  if  the  mean  time  (in  minutes)  that  people  watch  their  favorite  news 
station  are  the  same.  Suppose  that  the  table  below  shows  the  results  of  a  study. 


I°http://cnx.org/content/ml7132/latest/?collection=coll0522/latest/ 
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CNN 

FOX 

Local 

45 

15 

72 

12 

43 

37 

18 

68 

56 

38 

50 

60 

23 

31 

51 

35 

22 

Table  13.11 


Assume  that  all  distributions  are  normal,  the  four  population  standard  deviations  are  approxi- 
mately the  same,  and  the  data  were  collected  independently  and  randomly.  Use  a  level  of  signifi- 
cance of  0.05. 

Exercise  13.8.12 

Are  the  means  for  the  final  exams  the  same  for  all  statistics  class  delivery  t5^es?  The  table  below 
shows  the  scores  on  final  exams  from  several  randomly  selected  classes  that  used  the  different 
delivery  t5^es. 


Online 

Hybrid 

Face-to-Face 

72 

83 

80 

84 

73 

78 

77 

84 

84 

80 

81 

81 

81 

86 

79 

82 

Table  13.12 


Assirme  that  all  distributions  are  normal,  the  four  population  standard  deviations  are  approxi- 
mately the  same,  and  the  data  were  collected  independently  and  randomly.  Use  a  level  of  signifi- 
cance of  0.05. 

Exercise  13.8.13  (Solution  on  p.  611.) 

Are  the  mean  number  of  times  a  month  a  person  eats  out  same  for  whites,  blacks,  Hispanics  and 
Asians?  Suppose  that  the  table  below  shows  the  results  of  a  study. 


White 

Black 

Hispanic 

Asian 

6 

4 

7 

8 

8 

1 

3 

3 

2 

5 

5 

5 

4 

2 

4 

1 

6 

6 

7 
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Table  13.13 

Assume  that  all  distributions  are  normal,  the  four  population  standard  deviations  are  approxi- 
mately the  same,  and  the  data  were  collected  independently  and  randomly  Use  a  level  of  signifi- 
cance of  0.05. 

Exercise  13.8.14 

Are  the  mean  number  of  daily  visitors  to  a  ski  resort  the  same  for  the  three  types  of  snow  condi- 
tions? Suppose  that  the  table  below  shows  the  results  of  a  study. 


Powder 

Machine  Made 

Hard  Packed 

1210 

2107 

2846 

1080 

1149 

1638 

1537 

862 

2019 

941 

1870 

1178 

1528 

2233 

1382 

Table  13.14 


Assume  that  all  distributions  are  normal,  the  four  population  standard  deviations  are  approxi- 
mately the  same,  and  the  data  were  collected  independently  and  randomly.  Use  a  level  of  signifi- 
cance of  0.05. 

Exercise  13.8.15  (Solution  on  p.  611.) 

Is  the  variance  for  the  amount  of  money,  in  dollars,  that  shoppers  spend  on  Saturdays  at  the  mall 
the  same  as  the  variance  for  the  amount  of  money  that  shoppers  spend  on  Siindays  at  the  mall? 
Suppose  that  the  table  below  shows  the  results  of  a  study. 


Saturday 

Sunday 

75 

44 

62 

137 

18 

58 

0 

82 

150 

61 

124 

39 

94 

19 

50 

127 

62 

99 

31 

141 

73 

60 

118 

73 

89 

Table  13.15 
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Assume  that  both  distributions  are  normal.  Use  a  level  of  significance  of  0.05. 
Exercise  13.8.16 

Are  the  variances  for  incomes  on  the  East  Coast  and  the  West  Coast  the  same?  Suppose  that  the 
table  below  shows  the  resiilts  of  a  study.  Income  is  shown  in  thousands  of  doUars. 


East 

West 

38 

71 

47 

126 

30 

42 

82 

51 

75 

44 

52 

90 

115 

88 

67 

Table  13.16 


Assume  that  both  distributions  are  normal.  Use  a  level  of  significance  of  0.05. 
'Exercises  11  -  16  were  contributed  by  Dr.  Larry  Green 
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13.9  Review" 

The  next  hvo  questions  refer  to  the  following  situation: 

Suppose  that  the  probability  of  a  drought  in  any  independent  year  is  20%.  Out  of  those  years  in  which  a 
drought  occurs,  the  probability  of  water  rationing  is  10%.  However,  in  any  year,  the  probability  of  water 
rationing  is  5%. 

Exercise  13.9.1  (Solution  on  p.  611.) 

What  is  the  probability  of  both  a  drought  and  water  rationing  occurring? 

Exercise  13.9.2  (Solution  on  p.  611.) 

Out  of  the  years  with  water  rationing,  find  the  probability  that  there  is  a  drought. 

The  next  three  questions  refer  to  the  following  survey: 

Favorite  Type  of  Pie  by  Gender 


apple 

pumpkin 

pecan 

female 

40 

10 

30 

male 

20 

30 

10 

Table  13.17 

Exercise  13.9.3  (Solution  on  p.  611.) 

Suppose  that  one  individual  is  randomly  chosen.  Find  the  probability  that  the  person's  favorite 
pie  is  apple  or  the  person  is  male. 

Exercise  13.9.4  (Solution  on  p.  611.) 

Suppose  that  one  male  is  randomly  chosen.  Find  the  probability  his  favorite  pie  is  pecan. 

Exercise  13.9.5  (Solution  on  p.  611.) 

Conduct  a  hypothesis  test  to  determine  if  favorite  pie  type  and  gender  are  independent. 

The  next  two  questions  refer  to  the  following  situation: 

Let's  say  that  the  probability  that  an  adiilt  watches  the  news  at  least  once  per  week  is  0.60. 

Exercise  13.9.6  (Solution  on  p.  611.) 

We  randomly  survey  14  people.  On  average,  how  many  people  do  we  expect  to  watch  the  news 
at  least  once  per  week? 

Exercise  13.9.7  (Solution  on  p.  611.) 

We  randomly  survey  14  people.  Of  interest  is  the  number  that  watch  the  news  at  least  once  per 
week.  State  the  distribution  of  X.  X  ~ 

Exercise  13.9.8  (Solution  on  p.  611.) 

The  following  histogram  is  most  likely  to  be  a  result  of  sampling  from  which  distribution? 


^^This  content  is  available  online  at  <http://cnx.org/content/ml7070/1.9/>. 
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Figure  13.4 


A.  Chi-Square 

B.  Geometric 

C.  Uniform 

D.  Binomial 

Exercise  13.9.9  (Solution  on  p.  611.) 

The  ages  of  De  Anza  evening  students  is  known  to  be  normally  distributed  with  a  population 
mean  of  40  and  a  popiilation  standard  deviation  of  6.  A  sample  of  6  De  Anza  evening  students 
reported  their  ages  (in  years)  as:  28;  35;  47;  45;  30;  50.  Find  the  probability  that  the  mean  of  6  ages 
of  randomly  chosen  students  is  less  than  35  years.  Hint:  Find  the  sample  mean. 

The  next  three  questions  refer  to  the  following  situation: 

The  amoimt  of  money  a  customer  spends  in  one  trip  to  the  supermarket  is  known  to  have  an  exponential 
distribution.  Suppose  the  mean  amount  of  money  a  customer  spends  in  one  trip  to  the  supermarket  is  $72. 

Exercise  13.9.10  (Solution  on  p.  612.) 

Find  the  probability  that  one  customer  spends  less  than  $72  in  one  trip  to  the  supermarket? 

Exercise  13.9.11  (Solution  on  p.  612.) 

Suppose  5  customers  pool  their  money.  (They  are  poor  college  students.)  How  much  money 
altogether  would  you  expect  the  5  customers  to  spend  in  one  trip  to  the  supermarket  (in  doUars)? 

Exercise  13.9.12  (Solution  on  p.  612.) 

State  the  distribution  to  use  if  you  want  to  find  the  probability  that  the  mean  amount  spent  by  5 
customers  in  one  trip  to  the  supermarket  is  less  than  $60. 

Exercise  13.9.13  (Solution  on  p.  612.) 

A  math  exam  was  given  to  all  the  fifth  grade  children  attending  Country  School.  Two  random 
samples  of  scores  were  taken.  The  niill  hypothesis  is  that  the  mean  math  scores  for  boys  and  girls 
in  fifth  grade  are  the  same.  Conduct  a  hypothesis  test. 


n 

X 

s2 

Boys 

55 

82 

29 

Girls 

60 

86 

46 

Table  13.18 
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Exercise  13.9.14  (Solution  on  p.  612.) 

In  a  survey  of  80  males,  55  had  played  an  organized  sport  growing  up.  Of  the  70  females  surveyed, 
25  had  played  an  organized  sport  growing  up.  We  are  interested  in  whether  the  proportion  for 
males  is  higher  than  the  proportion  for  females.  Conduct  a  hypothesis  test. 

Exercise  13.9.15  (Solution  on  p.  612.) 

Which  of  the  following  is  preferable  when  designing  a  hypothesis  test? 

A.  Maximize  a.  and  minimize  j6 

B.  Minimize  a  and  maximize  jS 

C.  Maximize  a  and  ^ 

D.  Minimize  a  and 

The  next  three  questions  refer  to  the  following  situation: 

120  people  were  siurveyed  as  to  their  favorite  beverage  (non-alcoholic).  The  results  are  below. 

Preferred  Beverage  by  Age 


0-9 

10-19 

20-29 

30  + 

Totals 

Milk 

14 

10 

6 

0 

30 

Soda 

3 

8 

26 

15 

52 

Juice 

7 

12 

12 

7 

38 

Totals 

24 

30 

44 

22 

120 

Table  13.19 


Exercise  13.9.16 

Are  the  events  of  milk  and  30+: 

a.  Independent  events?  Justify  your  answer 

b.  Mutually  exclusive  events?  Justify  your  answer. 


(Solution  on  p.  612.) 


Exercise  13.9.17  (Solution  on  p.  612.) 

Suppose  that  one  person  is  randomly  chosen.  Find  the  probability  that  person  is  10  -  19  given 
that  he/ she  prefers  juice. 

Exercise  13.9.18  (Solution  on  p.  612.) 

Are  Preferred  Beverage  and  Age  independent  events?  Conduct  a  hj^othesis  test. 

Exercise  13.9.19  (Solution  on  p.  612.) 

Given  the  following  histogram,  which  distribution  is  the  data  most  likely  to  come  from? 
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Figure  13.5 


A.  uniform 

B.  exponential 

C.  normal 

D.  chi-square 
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13.10  Lab:  One-Way  ANOVA^' 

Class  Time: 
Names: 

13.10.1  Student  Learning  Outcome: 

•  The  student  will  conduct  a  simple  One-Way  ANOVA  test  involving  three  variables. 

13.10.2  Collect  the  Data 

1.  Record  the  price  per  pound  of  8  fruits,  8  vegetables,  and  8  breads  in  your  local  supermar- 
ket. 


Fruits 

Vegetables 

Breads 

Table  13.20 


2.  Explain  how  you  could  try  to  collect  the  data  randomly. 

13.10.3  Analyze  the  Data  and  Conduct  a  Hypothesis  Test 

1.  Compute  the  following: 
a.  Fruit: 

i.  X  = 

ii.  Sx= 

iii.  n  = 
a.  Vegetables: 

i.  X  = 

ii.  = 

iii.  n- 
a.  Bread: 

i.  X  = 

ii.  Sx= 

iii.  n  = 

^^This  content  is  available  online  at  <http://caTx.org/content/ml7061/1.9/>. 
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2.  Find  the  following: 

a.  df  [num)  - 

b.  df  (denom)  = 

3.  State  the  approximate  distribution  for  the  test. 

4.  Test  statistic:  f  = 

5.  Sketch  a  graph  of  this  situation.  CLEARLY,  label  and  scale  the  horizontal  axis  and  shade  the  region(s) 
corresponding  to  the  p-value. 

6.  p-value  = 

7.  Test  at  a  =  0.05.  State  your  decision  and  conclusion. 

8.  a.  Decision:  Why  did  you  make  this  decision? 

b.  Conclusion  (write  a  complete  sentence). 

c.  Based  on  the  results  of  your  study,  is  there  a  need  to  further  investigate  any  of  the  food  groups' 

prices?  Why  or  why  not? 
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Solutions  to  Exercises  in  Chapter  13 

Solution  to  Example  13.3,  Problem  2  (p.  592) 

•  F  =  0.9496 

•  p  -  value  =  0.4402 

From  the  sample  data,  the  evidence  is  not  sufficient  to  conclude  that  the  mean,  heights  of  the  bean  plants 
are  different. 

Solutions  to  Practice:  One-Way  ANOVA 
Solution  to  Exercise  13.7.2  (p.  597) 

Solution  to  Exercise  13.7.3  (p.  597) 

df  (2)  =  15 

Solution  to  Exercise  13.7.4  (p.  597) 
Test  statistic  =  F  =  4.22 
Solution  to  Exercise  13.7.5  (p.  597) 
0.017 

Solutions  to  Homework 

Solution  to  Exercise  13.8.1  (p.  599) 

a.  Ho:  ji^:^  hj 
c.  df  (n)  =  2;  df  (d)  =  12 

e.  0.67 

f.  0.5305 

h.  Decision:  Do  not  reject  null;  Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  means  are 
different. 

Solution  to  Exercise  13.8.3  (p.  599) 

c.  df  (n)  =4;df(d)  =4 

e.  3.00 

f.  2  (0.1563)  =  0.3126.  Using  the  TI-83+/84+  function  2-SampFtest,  you  get  the  the  test  statistic  as  2.9986 

and  p-value  directly  as  0.3127.  If  you  input  the  lists  in  a  different  order,  you  get  a  test  statistic  of  0.3335 
but  the  p-value  is  the  same  because  this  is  a  two-tailed  test, 
h.  Decision:  Do  not  reject  null;  Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  variances  are 
different. 

Solution  to  Exercise  13.8.5  (p.  600) 

c.  df(M)  =  6;df(d)  =98 

e.  1.69 

f.  0.1319 

h.  Decision:  Do  not  reject  null;  Conclusion:  There  is  insufficient  evidence  to  conclude  that  the  mean  lap 
times  are  different. 

Solution  to  Exercise  13.8.7  (p.  600) 

a.  Ho-,  fid  =  Fn  =  H  =  He 

b.  Alternate  Hypothesis:  At  least  one  pair  of  means  is  different 

c.  df  (w)  =  3;  df  (d)  =  16 
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e.  8.69 

f.  0.0012 

h.  Decision:  Reject  null;  Conclusion:  There  is  sufficient  evidence  to  conclude  that  the  mean  lengths  are 
different. 

Solution  to  Exercise  13.8.11  (p.  600) 

c:  df(n)  =  2;  df(d)  =  14 

d:  ^2,14 
e:  4.08 
f:  0.0401 
h: 

ii:  Reject  the  null  hj^othesis 

iv:  There  is  sufficient  evidence  to  conclude  that  the  mean  times  are  different. 

Solution  to  Exercise  13.8.13  (p.  601) 

c:  df(n)  =  3;  df(d)  =  15 

d:  Fs^is 

e:  0.8853 

f:  0.4711 

h: 

ii:  Do  not  reject  the  null  hypothesis 

iv:  There  is  insufficient  evidence  to  conclude  that  the  mean  number  of  times  are  different. 

Solution  to  Exercise  13.8.15  (p.  602) 

c:  df(n)  =  11;  df(d)  =  12 

d:  Fii^i2 

e:  1.35 

f:  0.6090 

h: 

ii:  Do  not  reject  the  null  hypothesis 

iv:  There  is  insufficient  evidence  to  conclude  that  the  variances  are  different. 

Solutions  to  Review 

Solution  to  Exercise  13.9.1  (p.  604) 

0.02 

Solution  to  Exercise  13.9.2  (p.  604) 

0.40 

Solution  to  Exercise  13.9.3  (p.  604) 

100 
140 

Solution  to  Exercise  13.9.4  (p.  604) 

10 
60 

Solution  to  Exercise  13.9.5  (p.  604) 

p-value  —  0;  Reject  null;  Conclude  dependent  events 

Solution  to  Exercise  13.9.6  (p.  604) 

8.4 

Solution  to  Exercise  13.9.7  (p.  604) 

B  (14,0.60) 

Solution  to  Exercise  13.9.8  (p.  604) 

D 
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Solution  to  Exercise  13.9.9  (p.  605) 

0.3669 

Solution  to  Exercise  13.9.10  (p.  605) 

0.6321 

Solution  to  Exercise  13.9.11  (p.  605) 

$360 

Solution  to  Exercise  13.9.12  (p.  605) 


Solution  to  Exercise  13.9.13  (p.  605) 

p-value  =  0.0006;  Reject  null;  Conclude  averages  are  not  equal 
Solution  to  Exercise  13.9.14  (p.  606) 

p-value  —  0;  Reject  null;  Conclude  proportion  of  males  is  higher 

Solution  to  Exercise  13.9.15  (p.  606) 

D 

Solution  to  Exercise  13.9.16  (p.  606) 

a.  No 

b.  Yes,  P  (M  and  30+)  =  0 
Solution  to  Exercise  13.9.17  (p.  606) 


N 


12 

38 


Solution  to  Exercise  13.9.18 
No;  p-value  =  0 
Solution  to  Exercise  13.9.19 

A 


(P- 


606) 


606) 
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14.1  Practice  Final  Exam  V 

Questions  1-2  refer  to  the  following: 

An  experiment  consists  of  tossing  two  12-sided  dice  (the  numbers  1-12  are  printed  on  the  sides  of  each 
dice). 

•  Let  Event  A  =  both  dice  show  an  even  number 

•  Let  Event  B  =  both  dice  show  a  number  more  than  8 

Exercise  14.1.1  (Solution  on  p.  668.) 

Events  A  and  B  are: 


A.  Mutually  exclusive. 

B.  Independent. 

C.  Mutually  exclusive  and  independent. 

D.  Neither  mutually  exclusive  nor  independent. 

Exercise  14.1.2 

FindP(A|B) 

A  2 

A.  4 

B.  144 

c  ^ 

^-  16 
D.  ^ 


(Solution  on  p.  668.) 


Exercise  14.1.3  (Solution  on  p.  668.) 

Which  of  the  following  are  TRUE  when  we  perform  a  hypothesis  test  on  matched  or  paired  sam- 
ples? 

A.  Sample  sizes  are  almost  never  small. 

B.  Two  measurements  are  drawn  from  the  same  pair  of  individuals  or  objects. 

C.  Two  sample  means  are  compared  to  each  other. 

D.  Answer  choices  B  and  C  are  both  true. 


Questions  4-5  refer  to  the  following: 

118  students  were  asked  what  t5^e  of  color  their  bedrooms  were  painted:  light  colors,  dark  colors  or  vibrant 
colors.  The  results  were  tabulated  according  to  gender. 

^This  content  is  available  online  at  <http://cnx.org/content/ml6304/1.17/>. 
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Light  colors 

Dark  colors 

Vibrant  colors 

Female 

20 

22 

28 

Male 

10 

30 

8 

Table  14.1 

Exercise  14.1.4  (Solution  on  p.  668.) 

Find  the  probability  that  a  randomly  chosen  student  is  male  or  has  a  bedroom  painted  with  light 
colors. 

^-  118 

TT8 
r  48 

'^^  118 

D  15 

48 

Exercise  14.1.5  (Solution  on  p.  668.) 

Find  the  probability  that  a  randomly  chosen  student  is  male  given  the  student's  bedroom  is 
painted  with  dark  colors. 

A  ^ 
^-  118 
R  30 
48 

c  ^ 

118 


D. 


30 
52 


Questions  6-7  refer  to  the  following: 

We  are  interested  in  the  number  of  times  a  teenager  must  be  reminded  to  do  his/her  chores  each  week.  A 
survey  of  40  mothers  was  conducted.  The  table  below  shows  the  results  of  the  survey. 


X 

P{x) 

0 

2 

40 

1 

5 

40 

2 

3 

14 
40 

4 

7 

40 

5 

4 
40 

Table  14.2 

Exercise  14.1.6 

Find  the  probability  that  a  teenager  is  reminded  2  times. 


(Solution  on  p.  668.) 


A.  8 

R  8 

40 

C  ^ 

D.  2 

Exercise  14.1.7  (Solution  on  p.  668.) 

Find  the  expected  number  of  times  a  teenager  is  reminded  to  do  his/her  chores. 
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A.  15 

B.  2.78 

C.  1.0 

D.  3.13 

Questions  8-9  refer  to  the  following: 

On  any  given  day,  approximately  37.5%  of  the  cars  parked  in  the  De  Anza  parking  structure  are  parked 
crookedly.  (Survey  done  by  Kathy  Pliim.)  We  randomly  siirvey  22  cars.  We  are  interested  in  the  number  of 
cars  that  are  parked  crookedly. 

Exercise  14.1.8  (Solution  on  p.  668.) 

For  every  22  cars,  how  many  would  you  expect  to  be  parked  crookedly,  on  average? 


What  is  the  probability  that  at  least  10  of  the  22  cars  are  parked  crookedly. 

A.  0.1263 

B.  0.1607 

C.  0.2870 

D.  0.8393 

Exercise  14.1.10  (Solution  on  p.  668.) 

Using  a  sample  of  15  Stanford-Binet  IQ  scores,  we  wish  to  conduct  a  hypothesis  test.  Our  claim 
is  that  the  mean  IQ  score  on  the  Stanford-Binet  IQ  test  is  more  than  100.  It  is  known  that  the 
standard  deviation  of  all  Stanford-Binet  IQ  scores  is  15  points.  The  correct  distribution  to  use  for 
the  hypothesis  test  is: 

A.  Binomial 

B.  Student's-t 

C.  Normal 

D.  Uniform 

Questions  11  - 13  refer  to  the  following: 

De  Anza  College  keeps  statistics  on  the  pass  rate  of  students  who  enroll  in  math  classes.  In  a  sample  of  1795 
students  enrolled  in  Math  lA  (1st  quarter  calculus),  1428  passed  the  course.  In  a  sample  of  856  students 
enrolled  in  Math  IB  (2nd  quarter  calculus),  662  passed.  In  general,  are  the  pass  rates  of  Math  lA  and  Math 
IB  statistically  the  same?  Let  A  =  the  subscript  for  Math  lA  and  B  =  the  subscript  for  Math  IB. 

Exercise  14.1.11  (Solution  on  p.  668.) 

If  you  were  to  conduct  an  appropriate  hypothesis  test,  the  alternate  hypothesis  would  be: 


A.  8.25 

B.  11 

C.  18 

D.  7.5 


Exercise  14.1.9 


(Solution  on  p.  668.) 


A.  Ha.  Pa  =  Pb 

B.  Ha-.  Pa  >  Pb 

C.  Ho-.  Pa  =  Pb 

D.  H„:  Pa  /  Pb 


Exercise  14.1.12 

The  Type  I  error  is  to: 


(Solution  on  p. 


668.) 
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A.  conclude  that  the  pass  rate  for  Math  lA  is  the  same  as  the  pass  rate  for  Math  IB  when,  in  fact, 

the  pass  rates  are  different. 

B.  conclude  that  the  pass  rate  for  Math  lA  is  different  than  the  pass  rate  for  Math  IB  when,  in  fact, 

the  pass  rates  are  the  same. 

C.  conclude  that  the  pass  rate  for  Math  lA  is  greater  than  the  pass  rate  for  Math  IB  when,  in  fact, 

the  pass  rate  for  Math  lA  is  less  than  the  pass  rate  for  Math  IB. 

D.  conclude  that  the  pass  rate  for  Math  lA  is  the  same  as  the  pass  rate  for  Math  IB  when,  in  fact, 

they  are  the  same. 

Exercise  14.1.13  (Solution  on  p.  668.) 

The  correct  decision  is  to: 

A.  reject  Hg 

B.  not  reject  Hq 

C.  There  is  not  enough  information  given  to  conduct  the  hj^othesis  test 

Kia,  Alejandra,  and  Iris  are  runners  on  the  track  teams  at  three  different  schools.  Their  running  times,  in 
minutes,  and  the  statistics  for  the  track  teams  at  their  respective  schools,  for  a  one  mile  run,  are  given  in  the 
table  below: 


Running  Time 

School  Average  Running  Time 

School  Standard  Deviation 

Kia 

4.9 

5.2 

.15 

Alejandra 

4.2 

4.6 

.25 

Iris 

4.5 

4.9 

.12 

Table  14.3 


Exercise  14.1.14  (Solution  on  p.  668.) 

Which  student  is  the  BEST  when  compared  to  the  other  runners  at  her  school? 

A.  Kia 

B.  Alejandra 

C.  Iris 

D.  Impossible  to  determine 
Questions  15  - 16  refer  to  the  following: 

The  following  adult  ski  sweater  prices  are  from  the  Gorsuch  Ltd.  Winter  catalog: 
{$212,  $292,  $278,  $199$280,  $236} 

Assume  the  underlying  sweater  price  population  is  approximately  normal.  The  null  hypothesis  is  that  the 
mean  price  of  adult  ski  sweaters  from  Gorsuch  Ltd.  is  at  least  $275. 

Exercise  14.1.15  (Solution  on  p.  668.) 

The  correct  distribution  to  use  for  the  h5^othesis  test  is: 

A.  Normal 

B.  Binomial 

C.  Student's-t 

D.  Exponential 
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Exercise  14.1.16  (Solution  on  p.  668.) 

The  hypothesis  test: 

A.  is  two-tailed 

B.  is  left-tailed 

C.  is  right-tailed 

D.  has  no  tails 


Exercise  14.1.17  (Solution  on  p.  668.) 

Sara,  a  statistics  student,  wanted  to  determine  the  mean  number  of  books  that  college  professors 

have  in  their  office.  She  randomly  selected  2  buildings  on  campus  and  asked  each  professor  in  the 
selected  buildings  how  many  books  are  in  his/her  office.  Sara  surveyed  25  professors.  The  t3^e 
of  sampling  selected  is  a: 

A.  simple  random  sampling 

B.  systematic  sampling 

C.  cluster  sampling 

D.  stratified  sampling 

Exercise  14.1.18  (Solution  on  p.  668.) 

A  clothing  store  would  use  which  measiure  of  the  center  of  data  when  placing  orders  for  the 
tj^ical  "middle"  customer? 

A.  Mean 

B.  Median 

C.  Mode 

D.  IQR 

Exercise  14.1.19  (Solution  on  p.  668.) 

In  a  hjrpothesis  test,  the  p-value  is 

A.  the  probability  that  an  outcome  of  the  data  will  happen  purely  by  chance  when  the  null  hy- 

pothesis is  true. 

B.  called  the  preconceived  alpha. 

C.  compared  to  beta  to  decide  whether  to  reject  or  not  reject  the  null  h5^othesis. 

D.  Answer  choices  A  and  B  are  both  true. 


Questions  20  -  22  refer  to  the  following: 

A  community  college  offers  classes  6  days  a  week:  Monday  through  Saturday.  Maria  conducted  a  study  of 
the  students  in  her  classes  to  determine  how  many  days  per  week  the  students  who  are  in  her  classes  come 
to  campus  for  classes.  In  each  of  her  5  classes  she  randomly  selected  10  students  and  asked  them  how  many 
days  they  come  to  campus  for  classes.  Each  of  her  classes  are  the  same  size.  The  results  of  her  survey  are 
summarized  in  the  table  below. 


Number  of  Days  on  Campus 

Frequency 

Relative  Frequency 

Cumulative  Relative  Frequency 

1 

2 

2 

12 

.24 

3 

10 

.20 

4 

.98 

5 

0 

6 

1 

.02 

1.00 
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(Solution  on  p.  668.) 


Table  14.4 

Exercise  14.1.20  (Solution  on  p.  668.) 

Combined  with  convenience  sampling,  what  other  sampling  technique  did  Maria  use? 

A.  simple  random 

B.  systematic 

C.  cluster 

D.  stratified 

Exercise  14.1.21 

How  many  students  come  to  campus  for  classes  4  days  a  week? 

A.  49 

B.  25 

C.  30 

D.  13 

Exercise  14.1.22 

What  is  the  60th  percentile  for  the  this  data? 

A.  2 

B.  3 

C.  4 

D.  5 


(Solution  on  p.  668.) 


The  next  two  questions  refer  to  the  following: 

The  following  data  are  the  results  of  a  random  survey  of  110  Reservists  called  to  active  duty  to  increase 
security  at  California  airports. 


Number  of  Dependents 

Frequency 

0 

11 

1 

27 

2 

33 

3 

20 

4 

19 

Table  14.5 

Exercise  14.1.23  (Solution  on  p.  668.) 

Construct  a  95%  Confidence  Interval  for  the  true  population  mean  number  of  dependents  of 
Reservists  called  to  active  duty  to  increase  security  at  California  airports. 

A.  (1.85,2.32) 

B.  (1.80,2.36) 

C.  (1.97,2.46) 

D.  (1.92,2.50) 


Exercise  14.1.24 

The  95%  confidence  Interval  above  means: 


(Solution  on  p.  668.) 
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A.  5%  of  Confidence  Intervals  constructed  this  way  will  not  contain  the  true  population  aveage 

number  of  dependents. 

B.  We  are  95%  confident  the  true  population  mean  number  of  dependents  falls  in  the  interval. 

C.  Both  of  the  above  answer  choices  are  correct. 

D.  None  of  the  above. 

Exercise  14.1.25  (Solution  on  p.  669.) 

X  ~U  (4, 10).  Find  the  30th  percentile. 

A.  0.3000 

B.  3 

C.  5.8 

D.  6.1 

Exercise  14.1.26  (Solution  on  p.  669.) 

If  X  ~£xp  (0.8),  then  P  {x  <  ji)  ^ 

A.  0.3679 

B.  0.4727 

C.  0.6321 

D.  cannot  be  determined 

Exercise  14.1.27  (Solution  on  p.  669.) 

The  lifetime  of  a  computer  circuit  board  is  normally  distributed  with  a  mean  of  2500  hours  and  a 
standard  deviation  of  60  hours.  What  is  the  probability  that  a  randomly  chosen  board  will  last  at 
most  2560  hours? 


A  survey  of  123  Reservists  called  to  active  duty  as  a  result  of  the  September  11,  2001,  attacks 

was  conducted  to  determine  the  proportion  that  were  married.  Eighty-six  reported  being  married. 
Construct  a  98%  confidence  interval  for  the  true  population  proportion  of  reservists  called  to  active 
duty  that  are  married. 


Winning  times  in  26  mile  marathons  run  by  world  class  runners  average  145  minutes  with  a  stan- 
dard deviation  of  14  minutes.  A  sample  of  the  last  10  marathon  winning  times  is  collected. 

Let  X  =  mean  winning  times  for  10  marathons. 

The  distribution  for  x  is: 


A.  0.8413 

B.  0.1587 

C.  0.3461 

D.  0.6539 


Exercise  14.1.28 


(Solution  on  p.  669.) 


A.  (0.6030,0.7954) 

B.  (0.6181,0.7802) 

C.  (0.5927,0.8057) 

D.  (0.6312,0.7672) 


Exercise  14.1.29 


(Solution  on  p.  669.) 


B.  N  (145, 14) 
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C.  tg 

D.  fio 

Exercise  14.1.30  (Solution  on  p.  669.) 

Suppose  that  Phi  Beta  Kappa  honors  the  top  1%  of  college  and  iiniversity  seniors.  Assume  that 
grade  point  means  (G.P.A.)  at  a  certain  college  are  normally  distributed  with  a  2.5  mean  and  a 
standard  deviation  of  0.5.  What  would  be  the  minimum  G.P.A.  needed  to  become  a  member  of 
Phi  Beta  Kappa  at  that  coUege? 

A.  3.99 

B.  1.34 

C.  3.00 

D.  3.66 

The  number  of  people  living  on  American  farms  has  declined  steadily  during  this  century.  Here  are  data 
on  the  farm  popiilation  (in  millions  of  persons)  from  1935  to  1980. 


Year 

1935 

1940 

1945 

1950 

1955 

1960 

1965 

1970 

1975 

1980 

Popiilation 

32.1 

30.5 

24.4 

23.0 

19.1 

15.6 

12.4 

9.7 

8.9 

7.2 

Table  14.6 

The  linear  regression  equation  is  y-hat  =  1166.93  -  0.5868x 
Exercise  14.1.31 

What  was  the  expected  farm  population  (in  millions  of  persons)  for  1980? 

A.  7.2 

B.  5.1 

C.  6.0 

D.  8.0 

Exercise  14.1.32 

In  linear  regression,  which  is  the  best  possible  SSE? 

A.  13.46 

B.  18.22 

C.  24.05 

D.  16.33 


(Solution  on  p.  669.) 


(Solution  on  p.  669.) 


Exercise  14.1.33  (Solution  on  p.  669.) 

In  regression  analysis,  if  the  correlation  coefficient  is  close  to  1  what  can  be  said  about  the  best  fit 
line? 

A.  It  is  a  horizontal  line.  Therefore,  we  can  not  use  it. 

B.  There  is  a  strong  linear  pattern.  Therefore,  it  is  most  likely  a  good  model  to  be  used. 

C.  The  coefficient  correlation  is  close  to  the  limit.  Therefore,  it  is  hard  to  make  a  decision. 

D.  We  do  not  have  the  equation.  Therefore,  we  can  not  say  anything  about  it. 

Question  34-36  refer  to  the  following: 

A  study  of  the  career  plans  of  young  women  and  men  sent  questionnaires  to  all  722  members  of  the  senior 
class  in  the  College  of  Business  Administration  at  the  University  of  Illinois.  One  question  asked  which 
major  within  the  business  program  the  student  had  chosen.  Here  are  the  data  from  the  students  who 
responded. 
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Female 

Male 

Accoiinting 

68 

56 

Administration 

91 

40 

Ecomonics 

5 

6 

Finance 

61 

59 

Table  14.7:  Does  the  data  suggest  that  there  is  a  relationship  between  the  gender  of  students  and  their 

choice  of  major? 

Exercise  14.1.34  (Solution  on  p.  669.) 

The  distiibution  for  the  test  is: 

A.  Chfs 

B.  C/!f^3 

C.  ty21 

D.  N(0,1) 


Exercise  14.1.35  (Solution  on  p.  669.) 

The  expected  niraiber  of  female  who  choose  Finance  is  : 

A.  37 

B.  61 

C.  60 

D.  70 


Exercise  14.1.36  (Solution  on  p.  669.) 

The  p-value  is  0.0127  and  the  level  of  significance  is  0.05.  The  conclusion  to  the  test  is: 

A.  There  is  insufficient  evidence  to  conclude  that  the  choice  of  major  and  the  gender  of  the  student 

are  not  independent  of  each  other. 

B.  There  is  sufficient  evidence  to  conclude  that  the  choice  of  major  and  the  gender  of  the  student 

are  not  independent  of  each  other. 

C.  There  is  sufficient  evidence  to  conclude  that  students  find  Economics  very  hard. 

D.  There  is  in  sufficient  evidence  to  conclude  that  more  females  prefer  Administration  than  males. 

Exercise  14.1.37  (Solution  on  p.  669.) 

An  agency  reported  that  the  work  force  nationwide  is  composed  of  10%  professional,  10%  clerical, 
30%  skilled,  15%  service,  and  35%  semiskilled  laborers.  A  random  sample  of  100  San  Jose  residents 
indicated  15  professional,  15  clerical,  40  skilled,  10  service,  and  20  semiskilled  laborers.  At  a  =  .10 
does  the  work  force  in  San  Jose  appear  to  be  consistent  with  the  agency  report  for  the  nation? 
Which  kind  of  test  is  it? 

A.  Chi'^  goodness  of  fit 

B.  Chf  test  of  independence 

C.  Independent  groups  proportions 

D.  Unable  to  determine 
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14.2  Practice  Final  Exam  2' 


Exercise  14.2.1  (Solution  on  p.  669.) 

A  study  was  done  to  determme  the  proportion  of  teenagers  that  own  a  car.  The  population 
proportion  of  teenagers  that  own  a  car  is  the 

A.  statistic 

B.  parameter 

C.  population 

D.  variable 


The  next  two  questions  refer  to  the  following  data: 


value 

frequency 

0 

1 

1 

4 

2 

7 

3 

9 

6 

4 

Table  14.8 

Exercise  14.2.2  (Solution  on  p.  669.) 

The  box  plot  for  the  data  is: 


0        2  3 


6 


^This  content  is  available  online  at  <http://cnx.org/content/ml6303/1.16/>. 
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0        13  6 

D. 

Exercise  14.2.3  (Solution  on  p.  669.) 

If  6  were  added  to  each  value  of  the  data  in  the  table,  the  15th  percentile  of  the  new  list  of  values 
is: 

A.  6 

B.  1 

C.  7 

D.  8 


The  next  two  questions  refer  to  the  following  situation: 

Suppose  that  the  probability  of  a  drought  in  any  independent  year  is  20%.  Out  of  those  years  in  which  a 
drought  occurs,  the  probability  of  water  rationing  is  10%.  However,  in  any  year,  the  probability  of  water 
rationing  is  5%. 

Exercise  14.2.4  (Solution  on  p.  669.) 

What  is  the  probability  of  both  a  drought  and  water  rationing  occurring? 

A.  0.05 

B.  0.01 

C.  0.02 

D.  0.30 


Exercise  14.2.5  (Solution  on  p.  669.) 

Which  of  the  following  is  true? 

A.  drought  and  water  rationing  are  independent  events 

B.  drought  and  water  rationing  are  mutually  exclusive  events 

C.  none  of  the  above 


The  next  two  questions  refer  to  the  following  situation: 

Suppose  that  a  survey  yielded  the  following  data: 

Favorite  Pie  Type 


gender 

apple 

pumpkin 

pecan 

female 

40 

10 

30 

male 

20 

30 

10 

Table  14.9 


Exercise  14.2.6  (Solution  on  p.  669.) 

Suppose  that  one  individual  is  randomly  chosen.  The  probability  that  the  person's  favorite  pie  is 
apple  or  the  person  is  male  is: 
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A  40 
^-  60 

R  60^ 

140 

r  120 

'^^  140 

D  150 

140 

Exercise  14.2.7  (Solution  on  p.  669.) 

Suppose  Ho  is:  Favorite  pie  type  and  gender  are  independent. 

The  p-value  is: 

A.  «0 

B.  1 

C.  0.05 

D.  cannot  be  determined 

The  next  two  questions  refer  to  the  following  situation: 

Let's  say  that  the  probability  that  an  adult  watches  the  news  at  least  once  per  week  is  0.60.  We  randomly 
survey  14  people.  Of  interest  is  the  number  that  watch  the  news  at  least  once  per  week. 

Exercise  14.2.8  (Solution  on  p.  669.) 

Which  of  the  following  statements  is  FALSE? 

A.  X~  6(14,0.60) 

B.  The  values  for  x  are:  {1, 2, 3, 14} 

C.  8.4 

D.  P  (X  =  5)  =  0.0408 

Exercise  14.2.9  (Solution  on  p.  669.) 

Find  the  probability  that  at  least  6  adults  watch  the  news. 

A  ^ 

B.  0.8499 

C.  0.9417 

D.  0.6429 

Exercise  14.2.10  (Solution  on  p.  669.) 

The  following  histogram  is  most  likely  to  be  a  result  of  sampling  from  which  distribution? 
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A.  Chi-Square  with  df  =  6 

B.  Exponential 

C.  Uniform 

D.  Binomial 


The  ages  of  campus  day  and  evening  students  is  known  to  be  normally  distributed.  A  sample  of  6  campus 
day  and  evening  students  reported  their  ages  (in  years)  as:  {18, 35, 27, 45, 20, 20} 

Exercise  14.2.11  (Solution  on  p.  670.) 

What  is  the  error  bound  for  the  90%  confidence  interval  of  the  true  average  age? 

A.  11.2 

B.  22.3 

C.  17.5 

D.  8.7 


Exercise  14.2.12  (Solution  on  p.  670.) 

If  a  normally  distiibuted  random  variable  has  ji  —  0  and  cr  —  1 ,  then  97.5%  of  the  population 
values  lie  above: 


A.  -1.96 

B.  1.96 

C.  1 

D.  -1 


The  next  three  questions  refer  to  the  following  situation: 

The  amount  of  money  a  customer  spends  in  one  trip  to  the  supermarket  is  known  to  have  an  exponential 
distribution.  Suppose  the  average  amoimt  of  money  a  customer  spends  in  one  trip  to  the  supermarket  is 
$72. 

Exercise  14.2.13  (Solution  on  p.  670.) 

What  is  the  probability  that  one  customer  spends  less  than  $72  in  one  trip  to  the  supermarket? 

A.  0.6321 
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B.  0.5000 

C.  0.3714 

D.  1 


Exercise  14.2.14  (Solution  on  p.  670.) 

How  much  money  altogether  would  you  expect  next  5  customers  to  spend  in  one  trip  to  the 
supermarket  (in  dollars)? 

A.  72 

C.  5184 

D.  360 


Exercise  14.2.15  (Solution  on  p.  670.) 

If  you  want  to  find  the  probability  that  the  mean  of  50  customers  is  less  than  $60,  the  distribution 
to  use  is: 

A.  N(72,72) 


C.  Exp  (72) 

D.  Exp  (i) 


The  next  three  questions  refer  to  the  following  situation: 

The  amount  of  time  it  takes  a  foiirth  grader  to  carry  out  the  trash  is  imiformly  distributed  in  the  interval 
from  1  to  10  minutes. 

Exercise  14.2.16  (Solution  on  p.  670.) 

What  is  the  probability  that  a  randomly  chosen  fourth  grader  takes  more  than  7  minutes  to  take 
out  the  trash? 

A.  i 

B.  I 

C  ^ 
^-  10 

D-  TO 


Exercise  14.2.17  (Solution  on  p.  670.) 

Which  graph  best  shows  the  probability  that  a  randomly  chosen  foujrth  grader  takes  more  than  6 
minutes  to  take  out  the  trash  given  that  he/ she  has  already  taken  more  than  3  minutes? 
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f{x) 


f(x) 


1/7  ■- 


1/9 


0     1        6  10 


0     1        6  10 


c) 


f(x) 


d) 


f(x) 


1/9  ■- 


1/7  ■- 


0    3       6  10 


■0    3        6  10 


Exercise  14.2.18  (Solution  on  p.  670.) 

We  should  expect  a  fourth  grader  to  take  how  many  minutes  to  take  out  the  trash? 


A.  4.5 

B.  5.5 

C.  5 

D.  10 


The  next  three  questions  refer  to  the  following  situation: 

At  the  beginning  of  the  quarter,  the  amount  of  time  a  student  waits  in  line  at  the  campus  cafeteria  is  nor- 
mally distributed  with  a  mean  of  5  minutes  and  a  standard  deviation  of  1.5  minutes. 

Exercise  14.2.19  (Solution  on  p.  670.) 

What  is  the  90th  percentile  of  waiting  times  (in  minutes)? 

A.  1.28 

B.  90 

C.  7.47 

D.  6.92 


Exercise  14.2.20  (Solution  on  p.  670.) 

The  median  waiting  time  (in  minutes)  for  one  student  is: 

A.  5 

B.  50 

C.  2.5 

D.  1.5 


Exercise  14.2.21  (Solution  on  p.  670.) 

Find  the  probability  that  the  average  wait  time  of  10  students  is  at  most  5.5  minutes. 

A.  0.6301 

B.  0.8541 

C.  0.3694 
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D.  0.1459 


Exercise  14.2.22  (Solution  on  p.  670.) 

A  sample  of  80  software  engineers  in  Silicon  Valley  is  taken  and  it  is  found  that  20%  of  them  earn 
approximately  $50,000  per  year.  A  point  estimate  for  the  true  proportion  of  engineers  in  Silicon 
Valley  who  earn  $50,000  per  year  is: 

A.  16 

B.  0.2 

C.  1 

D.  0.95 


Exercise  14.2.23  (Solution  on  p.  670.) 

If  P  (Z  <  Zee)  =  0.  1587  where  Z~]V  (0, 1) ,  then  a  is  equal  to: 

A.  -1 

B.  0.1587 

C.  0.8413 

D.  1 


Exercise  14.2.24  (Solution  on  p.  670.) 

A  professor  tested  35  students  to  determine  their  entering  skills.  At  the  end  of  the  term,  after 
completing  the  coiurse,  the  same  test  was  administered  to  the  same  35  students  to  study  their 
improvement.  This  would  be  a  test  of: 

A.  independent  groups 

B.  2  proportions 

C.  matched  pairs,  dependent  groups 

D.  exclusive  groups 


Exercise  14.2.25  (Solution  on  p.  670.) 

A  math  exam  was  given  to  all  the  third  grade  children  attending  ABC  School.  Two  random 
samples  of  scores  were  taken. 


n 

X 

s 

Boys 

55 

82 

5 

Girls 

60 

86 

7 

Table  14.10 


Which  of  the  following  correctly  describes  the  results  of  a  h3^othesis  test  of  the  claim,  "There  is 
a  difference  between  the  mean  scores  obtained  by  third  grade  girls  and  boys  at  the  5  %  level  of 
significance"? 

A.  Do  not  reject  Hg.  There  is  insufficient  evidence  to  conclude  that  there  is  a  difference  in  the 

mean  scores. 

B.  Do  not  reject  Hp.  There  is  sufficient  evidence  to  conclude  that  there  is  a  difference  in  the  mean 

scores. 

C.  Reject  Hq.  There  is  insufficient  evidence  to  conclude  that  there  is  no  difference  in  the  mean 

scores. 

D.  Reject  Hg.  There  is  sufficient  evidence  to  conclude  that  there  is  a  difference  in  the  mean  scores. 
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Exercise  14.2.26  (Solution  on  p.  670.) 

In  a  survey  of  80  males,  45  had  played  an  organized  sport  growing  up.  Of  the  70  females  surveyed, 
25  had  played  an  organized  sport  growing  up.  We  are  interested  in  whether  the  proportion  for 
males  is  higher  than  the  proportion  for  females.  The  correct  conclusion  is: 

A.  There  is  insufficient  information  to  conclude  that  the  proportion  for  males  is  the  same  as  the 

proportion  for  females. 

B.  There  is  insufficient  information  to  conclude  that  the  proportion  for  males  is  not  the  same  as 

the  proportion  for  females. 

C.  There  is  sufficient  evidence  to  conclude  that  the  proportion  for  males  is  higher  than  the  pro- 

portion for  females. 

D.  Not  enough  information  to  determine. 

Exercise  14.2.27  (Solution  on  p.  670.) 

Note:  Chi-Square  Test  of  a  Single  Variance;  Not  all  classes  cover  this  topic.  From  past  experience, 
a  statistics  teacher  has  found  that  the  average  score  on  a  midterm  is  81  with  a  standard  deviation 
of  5.2.  This  term,  a  class  of  49  students  had  a  standard  deviation  of  5  on  the  midterm.  Do  the  data 
indicate  that  we  shoiild  reject  the  teacher's  claim  that  the  standard  deviation  is  5.2?  Use  a  —  0.05. 

A.  Yes 

B.  No 

C.  Not  enough  information  given  to  solve  the  problem 

Exercise  14.2.28  (Solution  on  p.  670.) 

Note:  F  Distribution  Test  of  ANOVA;  Not  all  classes  cover  this  topic.  Three  loading  machines 
are  being  compared.  Ten  samples  were  taken  for  each  machine.  Machine  I  took  an  average  of  31 
minutes  to  load  packages  with  a  standard  deviation  of  2  minutes.  Machine  11  took  an  average  of  28 
minutes  to  load  packages  with  a  standard  deviation  of  1.5  minutes.  Machine  111  took  an  average  of 
29  minutes  to  load  packages  with  a  standard  deviation  of  1  minute.  Find  the  p-value  when  testing 
that  the  average  loading  times  are  the  same. 

A.  the  p-value  is  close  to  0 

B.  p-value  is  close  to  1 

C.  Not  enough  information  given  to  solve  the  problem 


The  next  three  questions  refer  to  the  following  situation: 

A  corporation  has  offices  in  different  parts  of  the  country.  It  has  gathered  the  following  information  con- 
cerning the  number  of  bathrooms  and  the  number  of  employees  at  seven  sites: 


Number  of  employees  x 

650 

730 

810 

900 

102 

107 

1150 

Number  of  bathrooms  y 

40 

50 

54 

61 

82 

110 

121 

Table  14.11 


Exercise  14.2.29  (Solution  on  p.  670.) 

Is  the  correlation  between  the  number  of  employees  and  the  number  of  bathrooms  significant? 

A.  Yes 

B.  No 

C.  Not  enough  information  to  answer  question 
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Exercise  14.2.30  (Solution  on  p.  670.) 

The  linear  regression  equation  is: 

A.  y  =  0.0094  -  79.96x 

B.  y  =  79.96  +  0.0094x 

C.  y  =  79.96  -  0.0094x 

D.  y  =  -0.0094  +  79.96x 

Exercise  14.2.31  (Solution  on  p.  670.) 

If  a  site  has  1150  employees,  approximately  how  many  bathrooms  should  it  have? 

A.  69 

B.  91 

C.  91,954 

D.  We  should  not  be  estimating  here. 

Exercise  14.2.32  (Solution  on  p.  670.) 

Note:  Chi-Square  Test  of  a  Single  Variance;  Not  all  classes  cover  this  topic.  Suppose  that  a  sample 
of  size  10  was  collected,  with  x  =  4.4  and  s  =  1.4  . 

Ho  :  cr'^=  1.6  vs.  Ha'.  cP-  ^  1.6.  Which  graph  best  describes  the  results  of  the  test? 


6.S9 


1.  9S 


1  9G 


11.03 


-2.  23 


Z  23 


Exercise  14.2.33  (Solution  on  p.  670.) 

64  backpackers  were  asked  the  number  of  days  their  latest  backpacking  trip  was.  The  number  of 
days  is  given  in  the  table  below: 


#  of  days 

1 

2 

3 

4 

5 

6 

7 

8 

Frequency 

5 

9 

6 

12 

7 

10 

5 

10 

Table  14.12 

Conduct  an  appropriate  test  to  determine  if  the  distribution  is  uniform. 

A.  The  p-value  is  >  0.10.  There  is  insufficient  information  to  conclude  that  the  distribution  is  not 

uniform. 

B.  The  p-value  is  <  0.01.  There  is  sufficient  information  to  conclude  the  distribution  is  not  uni- 

form. 

C.  The  p-value  is  between  0.01  and  0.10,  but  without  alpha  (a)  there  is  not  enough  information 


Available  for  free  at  Connexions  <http:/ /cnx.org/content/coll0522/1.40> 


APPENDIX 


631 


D.  There  is  no  such  test  that  can  be  conducted. 

Exercise  14.2.34  (Solution  on  p.  670.) 

Note:  F  Distribution  test  of  One-Way  ANOVA;  Not  aU  classes  cover  this  topic.  Which  of  the 
following  statements  is  true  when  using  one-way  ANOVA? 

A.  The  populations  from  which  the  samples  are  selected  have  different  distributions. 

B.  The  sample  sizes  are  large. 

C.  The  test  is  to  determine  if  the  different  groups  have  the  same  means. 

D.  There  is  a  correlation  between  the  factors  of  the  experiment. 
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14.3  Data  Sets' 
14.3.1  Lap  Times 

The  following  tables  provide  lap  times  from  Terri  Vogel's  Log  Book.  Times  are  recorded  in  seconds  for 
2.5-mile  laps  completed  in  a  series  of  races  and  practice  runs. 

Race  Lap  Times  (in  Seconds) 


Lap  1 

Lap  2 

Lap  3 

Lap  4 

Lap  5 

Lap  6 

Lap  7 

Race  1 

135 

130 

131 

132 

130 

131 

133 

Race  2 

134 

131 

131 

129 

128 

128 

129 

Race  3 

129 

128 

127 

127 

130 

127 

129 

Race  4 

125 

125 

126 

125 

124 

125 

125 

Race  5 

133 

132 

132 

132 

131 

130 

132 

Race  6 

130 

130 

130 

129 

129 

130 

129 

Race  7 

132 

131 

133 

131 

134 

134 

131 

Races 

127 

128 

127 

130 

128 

126 

128 

Race  9 

132 

130 

127 

128 

126 

127 

124 

Race  10 

135 

131 

131 

132 

130 

131 

130 

Race  11 

132 

131 

132 

131 

130 

129 

129 

Race  12 

134 

130 

130 

130 

131 

130 

130 

Race  13 

128 

127 

128 

128 

128 

129 

128 

Race  14 

132 

131 

131 

131 

132 

130 

130 

Race  15 

136 

129 

129 

129 

129 

129 

129 

Race  16 

129 

129 

129 

128 

128 

129 

129 

Race  17 

134 

131 

132 

131 

132 

132 

132 

Race  18 

129 

129 

130 

130 

133 

133 

127 

Race  19 

130 

129 

129 

129 

129 

129 

128 

Race  20 

131 

128 

130 

128 

129 

130 

130 

Table  14.13 


^This  content  is  available  online  at  <http://caTx.org/content/ml7132/1.5/>. 
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Practice  Lap  Times  (in  Seconds) 


Lap  1 

Lap  2 

Lap  3 

Lap  4 

Lap  5 

Lap  6 

Lap  7 

Practice  1 

142 

143 

180 

137 

134 

134 

172 

Practice  2 

140 

135 

134 

133 

128 

128 

131 

Practice  3 

130 

133 

130 

128 

135 

133 

133 

Practice  4 

141 

136 

137 

136 

136 

136 

145 

Practice  5 

140 

138 

136 

137 

135 

134 

134 

Practice  6 

142 

142 

139 

138 

129 

129 

127 

Practice  7 

139 

137 

135 

135 

137 

134 

135 

Practice  8 

143 

136 

134 

133 

134 

133 

132 

Practice  9 

135 

134 

133 

133 

132 

132 

133 

Practice  10 

131 

130 

128 

129 

127 

128 

127 

Practice  11 

143 

139 

139 

138 

138 

137 

138 

Practice  12 

132 

133 

131 

129 

128 

127 

126 

Practice  13 

149 

144 

144 

139 

138 

138 

137 

Practice  14 

133 

132 

137 

133 

134 

130 

131 

Practice  15 

138 

136 

133 

133 

132 

131 

131 

Table  14.14 


14.3.2  Stock  Prices 

The  following  table  lists  initial  public  offering  (IPO)  stock  prices  for  aU  1999  stocks  that  at  least  doubled  in 
value  during  the  first  day  of  trading.  This  is  historical  data. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


634 


APPENDIX 


IPO  Offer  Prices 


$17.00 

$23.00 

$14.00 

$16.00 

$12.00 

$26.00 

$20.00 

$22.00 

$14.00 

$15.00 

$22.00 

$18.00 

$18.00 

$21.00 

$21.00 

$19.00 

$15.00 

$21.00 

$18.00 

$17.00 

$15.00 

$25.00 

$14.00 

$30.00 

$16.00 

$10.00 

$20.00 

$12.00 

$16.00 

$17.44 

$16.00 

$14.00 

$15.00 

$20.00 

$20.00 

$16.00 

$17.00 

$16.00 

$15.00 

$15.00 

$19.00 

$48.00 

$16.00 

$18.00 

$9.00 

$18.00 

$18.00 

$20.00 

$8.00 

$20.00 

$17.00 

$14.00 

$11.00 

$16.00 

$19.00 

$15.00 

$21.00 

$12.00 

$8.00 

$16.00 

$13.00 

$14.00 

$15.00 

$14.00 

$13.41 

$28.00 

$21.00 

$17.00 

$28.00 

$17.00 

$19.00 

$16.00 

$17.00 

$19.00 

$18.00 

$17.00 

$15.00 

$14.00 

$21.00 

$12.00 

$18.00 

$24.00 

$15.00 

$23.00 

$14.00 

$16.00 

$12.00 

$24.00 

$20.00 

$14.00 

$14.00 

$15.00 

$14.00 

$19.00 

$16.00 

$38.00 

$20.00 

$24.00 

$16.00 

$8.00 

$18.00 

$17.00 

$16.00 

$15.00 

$7.00 

$19.00 

$12.00 

$8.00 

$23.00 

$12.00 

$18.00 

$20.00 

$21.00 

$34.00 

$16.00 

$26.00 

$14.00 

Table  14.15 


NOTE:  Data  compiled  by  Jay  R.  Ritter  of  Univ.  of  Florida  using  data  from  Securities  Data  Co.  and 
Bloomberg. 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


APPENDIX 


635 


14.4  Group  Projects 

14.4.1  Group  Project:  Univariate  Data* 

14.4.1.1  Student  Learning  Objectives 

•  The  student  will  design  and  carry  out  a  survey. 

•  The  student  wUl  analyze  and  graphically  display  the  results  of  the  survey. 

14.4.1.2  Instructions 

As  you  complete  each  task  below,  check  it  off.  Answer  aU  questions  in  your  siraimary 

 Decide  what  data  you  are  going  to  study. 

EXAMPLES:  Here  are  two  examples,  but  you  may  NOT  use  them:  number  of  M&M's  per 
small  bag,  number  of  pencils  students  have  in  their  backpacks. 

  Are  your  data  discrete  or  continuous?  How  do  you  know? 

  Decide  how  you  are  going  to  collect  the  data  (for  instance,  buy  30  bags  of  M&M's;  collect  data  from 

theWorld  Wide  Web). 

  Describe  your  sampling  technique  in  detail.  Use  cluster,  stratified,  systematic,  or  simple  random 

(using  a  random  number  generator)  sampling.  Do  not  use  convenience  sampling.  What  method  did 
you  use?  Why  did  you  pick  that  method? 

  Conduct  your  survey.  Your  data  size  must  be  at  least  30. 

  Summarize  your  data  in  a  chart  with  columns  showing  data  value,  frequency,  relative  frequency 

and  cumulative  relative  frequency. 

  Answer  the  following  (rounded  to  2  decimal  places): 

1.  x  = 

2.  s  = 

3.  First  quartile  = 

4.  Median  = 

5.  70th  percentile  = 

 What  value  is  2  standard  deviations  above  the  mean? 

 What  value  is  1.5  standard  deviations  below  the  mean? 

  Construct  a  histogram  displaying  your  data. 

 In  complete  sentences,  describe  the  shape  of  your  graph. 

  Do  you  notice  any  potential  outliers?  If  so,  what  values  are  they?  Show  your  work  in  how  you  used 

the  potential  outlier  formula  in  Chapter  2  (since  you  have  univariate  data)  to  determine  whether  or 
not  the  values  might  be  outliers. 

  Construct  a  box  plot  displaying  your  data. 

  Does  the  middle  50%  of  the  data  appear  to  be  concentrated  together  or  spread  apart?  Explain  how 

you  determined  this. 

  Looking  at  both  the  histogram  and  the  box  plot,  discuss  the  distribution  of  your  data. 

14.4.1.3  Assignment  Checklist 

You  need  to  turn  in  the  following  typed  and  stapled  packet,  with  pages  in  the  following  order: 

  Cover  sheet:  name,  class  time,  and  name  of  your  study 

*This  content  is  available  online  at  <http://caTx.org/content/ml7142/1.8/>. 
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Summary  page:  This  should  contain  paragraphs  written  with  complete  sentences.  It  should  include 
answers  to  all  the  questions  above.  It  should  also  include  statements  describing  the  population  imder 
study,  the  sample,  a  parameter  or  parameters  being  studied,  and  the  statistic  or  statistics  produced. 
URL  for  data,  if  your  data  are  from  the  World  Wide  Web. 

Chart  of  data,  frequency,  relative  frequency  and  cumulative  relative  frequency. 
Page(s)  of  graphs:  histogram  and  box  plot. 
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14.4.2  Group  Project:  Continuous  Distributions  and  Central  Limit  Theorem^ 
14.4.2.1  Student  Learning  Objectives 

•  The  student  will  collect  a  sample  of  continuous  data. 

•  The  student  will  attempt  to  fit  the  data  sample  to  various  distribution  models. 

•  The  student  will  validate  the  Central  Limit  Theorem. 


14.4.2.2  Instructions 

As  you  complete  each  task  below,  check  it  off.  Answer  all  questions  in  your  siraimary. 

14.4.2.3  Part  I:  Sampling 

  Decide  what  continuous  data  you  are  going  to  study.  (Here  are  two  examples,  but  you  may  NOT  use 

them:  the  amount  of  money  a  student  spends  on  college  supplies  this  term  or  the  length  of  a  long 
distance  telephone  call.) 

  Describe  your  sampling  technique  in  detail.  Use  cluster,  stratified,  systematic,  or  simple  random 

(using  a  random  number  generator)  sampling.  Do  not  use  convenience  sampling.  What  method  did 
you  use?  Why  did  you  pick  that  method? 

  Conduct  your  survey.  Gather  at  least  150  pieces  of  continuous  quantitative  data. 

 Define  (in  words)  the  random  variable  for  your  data.  X  =  

  Create  2  lists  of  youj  data:  (1)  imordered  data,  (2)  in  order  of  smallest  to  largest. 

 Find  the  sample  mean  and  the  sample  standard  deviation  (rormded  to  2  decimal  places). 

1.  x  = 

2.  s  = 

  Construct  a  histogram  of  your  data  containing  5-10  intervals  of  equal  width.  The  histogram  should 

be  a  representative  display  of  your  data.  Label  and  scale  it. 

14.4.2.4  Part  II:  Possible  Distributions 

  Suppose  that  X  followed  the  theoretical  distributions  below.  Set  up  each  distribution  using  the  ap- 
propriate information  from  your  data. 

  Uniform:  X  ~  U  Use  the  lowest  and  highest  values  as  a  and  b. 

  Exponential:  X  ~  Exp  ^Use  x  to  estimate  ^  . 

  Normal:  X  ~  N  Use  x  to  estimate  for  ^  and  s  to  estimate  for  a. 

  Must  your  data  fit  one  of  the  above  distributions?  Explain  why  or  why  not. 

  Could  the  data  fit  2  or  3  of  the  above  distributions  (at  the  same  time)?  Explain. 

  Calculate  the  value  A:(an  X  value)  that  is  1.75  standard  deviations  above  the  sample  mean,  k  = 

 (rounded  to  2  decimal  places)  Note:  k  —  x  +  (1.75)  *  s 

 Determine  the  relative  frequencies  (RF)  rormded  to  4  decimal  places. 

-    n  p    frequency  

total  number  surveyed 

2.  RF  {X  <k)  = 

3.  RF  lx>k)  = 

4.  RF  (X  =  fc)  = 

Use  a  separate  piece  of  paper  for  EACH  distribution  (uniform,  exponential,  normal)  to  respond  to  the 
following  questions. 

^This  content  is  available  online  at  <http://cnx.org/content/ml7141/1.9/>. 
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NOTE:  You  shoiild  have  one  page  for  the  uniform,  one  page  for  the  exponential,  and  one  page  for 
the  normal 

  State  the  distribution:  X  ~  

 Draw  a  graph  for  each  of  the  three  theoretical  distributions.  Label  the  axes  and  mark  them  appropri- 
ately. 

  Find  the  following  theoretical  probabilities  (rounded  to  4  decimal  places). 

1.  P(X  <  k  )  = 

2.  P(X  >  k  )  = 

3.  p(x  =  k )  = 

  Compare  the  relative  frequencies  to  the  corresponding  probabilities.  Are  the  values  close? 

 Does  it  appear  that  the  data  fit  the  distribution  weU?  Justify  your  answer  by  comparing  the  probabil- 
ities to  the  relative  frequencies,  and  the  histograms  to  the  ttieoretical  graphs. 

14.4.2.5  Part  III:  CLT  Experiments 

  From  your  original  data  (before  ordering),  use  a  random  number  generator  to  pick  40  samples  of 

size  5.  For  each  sample,  calculate  the  average. 
  On  a  separate  page,  attached  to  the  summary,  include  the  40  samples  of  size  5,  along  with  the  40 

sample  averages. 

  List  the  40  averages  in  order  from  smallest  to  largest. 

  Define  the  random  variable,  X ,  in  words.  X  = 

  State  the  approximate  theoretical  distribution  of  X.  X'-^ 

  Base  this  on  the  mean  and  standard  deviation  from  your  original  data. 

  Construct  a  histogram  displaying  your  data.  Use  5  to  6  intervals  of  equal  width.  Label  and  scale  it. 

Calciilate  the  value  k  (an  X  value)  that  is  1.75  standard  deviations  above  the  sample  mean.  k=  

(rotmded  to  2  decimal  places) 
Determine  the  relative  frequencies  (RF)  rounded  to  4  decimal  places. 

1.  KP{X<  k  )  = 

2.  RF(X  >k  )  = 

3.  RF(X  =    )  = 

Find  the  following  theoretical  probabilities  (rounded  to  4  decimal  places). 

•.  P(X<fc)  = 

•  .  P(X>fc)  = 

•  .  P(X  =  fc)  = 

  Draw  the  graph  of  the  theoretical  distribution  of  X. 

  Answer  the  questions  below. 

  Compare  the  relative  frequencies  to  the  probabilities.  Are  the  values  close? 

  Does  it  appear  that  the  data  of  averages  fit  the  distribution  of  X  well?  Justify  your  answer  by 

comparing  the  probabilities  to  the  relative  frequencies,  and  the  histogram  to  the  theoretical  graph. 

  In  3  -  5  complete  sentences  for  each,  answer  the  following  questions.  Give  thoughtful  explanations. 

  In  summary,  do  your  original  data  seem  to  fit  the  uniform,  exponential,  or  normal  distributions? 

Answer  why  or  why  not  for  each  distribution.  If  the  data  do  not  fit  any  of  those  distributions,  explain 
why. 

  What  happened  to  the  shape  and  distribution  when  you  averaged  your  data?  In  theory,  what 

should  have  happened?  In  theory,  would  "it"  always  happen?  Why  or  why  not? 
  Were  the  relative  frequencies  compared  to  the  theoretical  probabilities  closer  when  comparing  the 

X  or  X  distributions?  Explain  your  answer. 
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14.4.2.6  Assignment  Checklist 

You  need  to  turn  in  the  following  typed  and  stapled  packet,  with  pages  in  the  following  order: 
  Cover  sheet:  name,  class  time,  and  name  of  your  study 

  Summary  pages:  These  should  contain  several  paragraphs  written  with  complete  sentences  that  de- 
scribe the  experiment,  including  what  you  studied  and  your  sampling  technique,  as  well  as  answers 
to  all  of  the  questions  above. 

  URL  for  data,  if  your  data  are  from  the  World  Wide  Web. 

  Pages,  one  for  each  theoretical  distribution,  with  the  distribution  stated,  the  graph,  and  the  proba- 
bility questions  answered 

  Pages  of  the  data  requested 

  All  graphs  required 
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14.4.3  Partner  Project:  Hypothesis  Testing  -  Article^ 

14.4.3.1  Student  Learning  Objectives 

•  The  student  will  identify  a  hypothesis  testing  problem  in  print. 

•  The  student  will  conduct  a  survey  to  verify  or  dispute  the  results  of  the  hypothesis  test. 

•  The  student  wiU  siraimarize  the  article,  analysis,  and  conclusions  in  a  report. 

14.4.3.2  Instructions 

As  you  complete  each  task  below,  check  it  off.  Answer  all  questions  in  your  siraimary. 

  Find  an  article  in  a  newspaper,  magazine  or  on  the  internet  which  makes  a  claim  about  ONE  popula- 
tion mean  or  ONE  population  proportion.  The  claim  may  be  based  upon  a  siurvey  that  the  article  was 
reporting  on.  Decide  whether  this  claim  is  the  null  or  alternate  h5^othesis. 

  Copy  or  print  out  the  article  and  include  a  copy  in  your  project,  along  with  the  source. 

  State  how  you  will  collect  your  data.  (Convenience  sampling  is  not  acceptable.) 

  Conduct  your  survey.  You  must  have  more  than  50  responses  in  your  sample.  When  you  hand  in 

your  final  project,  attach  the  tally  sheet  or  the  packet  of  questionnaires  that  you  used  to  collect  data. 
Your  data  must  be  real. 

  State  the  statistics  that  are  a  result  of  your  data  collection:  sample  size,  sample  mean,  and  sample 

standard  deviation,  OR  sample  size  and  number  of  successes. 
  Make  2  copies  of  the  appropriate  solution  sheet. 

  Record  the  hypothesis  test  on  the  solution  sheet,  based  on  your  experiment.  Do  a  DRAFT  solution 

first  on  one  of  the  solution  sheets  and  check  it  over  carefully.  Have  a  classmate  check  your  solution 
to  see  if  it  is  done  correctly.  Make  your  decision  using  a  5%  level  of  significance.  Include  the  95% 
confidence  interval  on  the  solution  sheet. 

  Create  a  graph  that  illustrates  your  data.  This  may  be  a  pie  or  bar  chart  or  may  be  a  histogram  or  box 

plot,  depending  on  the  nature  of  your  data.  Produce  a  graph  that  makes  sense  for  your  data  and  gives 
useful  visual  information  about  your  data.  You  may  need  to  look  at  several  t5^es  of  graphs  before 
you  decide  which  is  the  most  appropriate  for  the  type  of  data  in  your  project. 

  Write  your  summary  (in  complete  sentences  and  paragraphs,  with  proper  grammar  and  correct 

spelling)  that  describes  the  project.  The  summary  MUST  include: 

1.  Brief  discussion  of  the  article,  including  the  source. 

2.  Statement  of  the  claim  made  in  the  article  (one  of  the  hypotheses). 

3.  Detailed  description  of  how,  where,  and  when  you  collected  the  data,  including  the  sampling  tech- 

nique. Did  you  use  cluster,  stratified,  systematic,  or  simple  random  sampling  (using  a  random 
number  generator)?  As  stated  above,  convenience  sampling  is  not  acceptable. 

4.  Conclusion  about  the  article  claim  in  light  of  your  h5^othesis  test.  This  is  the  conclusion  of  your 

h5^othesis  test,  stated  in  words,  in  the  context  of  the  situation  in  your  project  in  sentence  form, 
as  if  you  were  writing  this  conclusion  for  a  non-statistician. 

5.  Sentence  interpreting  your  confidence  interval  in  the  context  of  the  situation  in  youi  project. 

14.4.3.3  Assignment  Checklist 

Turn  in  the  following  typed  (12  point)  and  stapled  packet  for  your  final  project: 

  Cover  sheet  containing  your  name(s),  class  time,  and  the  name  of  your  study. 

  Summary,  which  includes  aU  items  listed  on  summary  checklist. 

  Solution  sheet  neatly  and  completely  filled  out.  The  solution  sheet  does  not  need  to  be  typed. 

''This  content  is  available  online  at  <http://cnx.org/content/ml7140/1.8/>. 
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Graphic  representation  of  your  data,  created  following  the  guidelines  discussed  above.  Include  only 
graphs  which  are  appropriate  and  useful. 

Raw  data  collected  AND  a  table  summarizing  the  sample  data  (n,  xbar  and  s;  or  x,  n,  and  p',  as 
appropriate  for  your  hypotheses).  The  raw  data  does  not  need  to  be  typed,  but  the  summary  does. 
Hand  in  the  data  as  you  collected  it.  (Either  attach  your  tally  sheet  or  an  envelope  containing  your 
questionnaires.) 
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14.4.4  Partner  Project:  Hypothesis  Testing  -  Word  Problem^ 

14.4.4.1  Student  Learning  Objectives 

•  The  student  will  write,  edit,  and  solve  a  hypothesis  testing  word  problem. 

14.4.4.2  Instructions 

Write  an  original  h3^othesis  testing  problem  for  either  ONE  population  mean  or  ONE  population  propor- 
tion. As  you  complete  each  task,  check  it  off.  Answer  all  questions  in  your  summary.  Look  at  the  homework 
for  the  Hypothesis  Testing:  Single  Mean  and  Single  Proportion  chapter  for  examples  (poems,  two  acts  of  a 
play,  a  work  related  problem).  The  problems  with  names  attached  to  them  are  problems  written  by  students 
in  past  quarters.  Some  other  examples  that  are  not  in  the  homework  include:  a  soccer  h5^othesis  testing 
poster,  a  cartoon,  a  news  reports,  a  children's  story,  a  song. 

 Your  problem  must  be  original  and  creative.  It  also  must  be  in  proper  English.  If  English  is  difficult 

for  you,  have  someone  edit  your  problem. 
 Your  problem  must  be  at  least  Vi  page,  t5^ed  and  singled  spaced.  This  DOES  NOT  include  the  data. 

Data  wiU  make  the  problem  longer  and  that  is  fine.  For  this  problem,  the  data  and  story  may  be  real 

or  fictional. 

 In  the  narrative  of  the  problem,  make  it  very  clear  what  the  niill  and  alternative  hypotheses  are. 

 Your  sample  size  must  be  LARGER  THAN  50  (even  if  it  is  fictional). 

  State  in  your  problem  how  you  will  collect  your  data. 

 Include  youjr  data  with  your  word  problem. 

  State  the  statistics  that  are  a  result  of  your  data  collection:  sample  size,  sample  mean,  and  sample 

standard  deviation,  OR  sample  size  and  number  of  successes. 
  Create  a  graph  that  illustrates  your  problem.  This  may  be  a  pie  or  bar  chart  or  may  be  a  histogram 

or  box  plot,  depending  on  the  nature  of  your  data.  Produce  a  graph  that  makes  sense  for  your  data 

and  gives  useful  visual  information  about  your  data.  You  may  need  to  look  at  several  types  of  graphs 

before  you  decide  which  is  the  most  appropriate  for  your  problem. 
 Make  2  copies  of  the  appropriate  solution  sheet. 

  Record  the  h5^othesis  test  on  the  solution  sheet,  based  on  your  problem.  Do  a  DRAFT  solution  first 

on  one  of  the  solution  sheets  and  check  it  over  carefiilly.  Make  yoiir  decision  using  a  5%  level  of 
significance.  Include  the  95%  confidence  interval  on  the  solution 


14.4.4.3  Assignment  Checklist 

You  need  to  turn  in  the  following  t5^ed  (12  point)  and  stapled  packet  for  your  final  project: 

 Cover  sheet  containing  yoiur  name,  the  name  of  your  problem,  and  the  date 

  The  problem 

  Data  for  the  problem 

  Solution  sheet  neatly  and  completely  filled  out.  The  solution  sheet  does  not  need  to  be  typed. 

  Graphic  representation  of  the  data,  created  following  the  guidelines  discussed  above.  Include  only 

graphs  that  are  appropriate  and  useful. 
  Sentences  interpreting  the  results  of  the  hypothesis  test  and  the  confidence  interval  in  the  context 

of  the  situation  in  the  project. 


This  content  is  available  online  at  <http://cnx.org/content/ml7144/1.7/>. 
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14.4.5  Group  Project:  Bivariate  Data,  Linear  Regression,  and  Univariate  Data 

14.4.5.1  Student  Learning  Objectives 

•  The  students  will  collect  a  bivariate  data  sample  through  the  use  of  appropriate  sampling  techniques. 

•  The  student  will  attempt  to  fit  the  data  to  a  linear  model. 

•  The  student  will  determine  the  appropriateness  of  linear  fit  of  the  model. 

•  The  student  wiU  analyze  and  graph  univariate  data. 

14.4.5.2  Instructions 

1 .  As  you  complete  each  task  below,  check  it  off.  Answer  all  questions  in  your  introduction  or  siraimary. 

2.  Check  your  course  calendar  for  intermediate  and  final  due  dates. 

3.  Graphs  may  be  constructed  by  hand  or  by  computer,  imless  your  instructor  informs  you  otherwise. 

All  graphs  must  be  neat  and  accurate. 

4.  All  other  responses  must  be  done  on  the  computer. 

5.  Neatness  and  quality  of  explanations  are  used  to  determine  youj  final  grade. 

14.4.5.3  Part  I:  Bivariate  Data 
Introduction 

  State  the  bivariate  data  your  group  is  going  to  study. 

Examples:  Here  are  two  examples,  but  you  may  NOT  use  them:  height  vs.  weight  and  age 
vs.  running  distance. 

  Describe  how  your  group  is  going  to  collect  the  data  (for  instance,  collect  data  from  the  web,  survey 

students  on  campus). 

  Describe  your  sampling  technique  in  detail.  Use  cluster,  stratified,  systematic,  or  simple  random 

sampling  (using  a  random  number  generator)  sampling.  Convenience  sampling  is  NOT  acceptable. 

 Conduct  your  survey.  Yoiir  number  of  pairs  must  be  at  least  30. 

 Print  out  a  copy  of  your  data. 

Analysis 

 On  a  separate  sheet  of  paper  construct  a  scatter  plot  of  the  data.  Label  and  scale  both  axes. 

  State  the  least  squares  line  and  the  correlation  coefficient. 

  On  your  scatter  plot,  in  a  different  color,  construct  the  least  squares  line. 

  Is  the  correlation  coefficient  significant?  Explain  and  show  how  you  determined  this. 

  Interpret  the  slope  of  the  linear  regression  line  in  the  context  of  the  data  in  your  project.  Relate  the 

explanation  to  your  data,  and  quantify  what  the  slope  tells  you. 
  Does  the  regression  line  seem  to  fit  the  data?  Why  or  why  not?  If  the  data  does  not  seem  to  be  linear, 

explain  if  any  other  model  seems  to  fit  the  data  better. 
  Are  there  any  outliers?  If  so,  what  are  they?  Show  your  work  in  how  you  used  the  potential  outlier 

formula  in  the  Linear  Regression  and  Correlation  chapter  (since  you  have  bivariate  data)  to  determine 

whether  or  not  any  pairs  might  be  outliers. 


*This  content  is  available  online  at  <http://caTx.org/content/ml7143/1.6/>. 
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14.4.5.4  Part  II:  Univariate  Data 

In  this  section,  you  will  use  the  data  for  ONE  variable  only.  Pick  the  variable  that  is  more  interesting  to 
analyze.  For  example:  if  your  independent  variable  is  sequential  data  such  as  year  with  30  years  and  one 
piece  of  data  per  year,  your  x-values  might  be  1971, 1972, 1973, 1974, . . .,  2000.  This  would  not  be  interesting 
to  analyze.  In  that  case,  choose  to  use  the  dependent  variable  to  analyze  for  this  part  of  the  project. 

  Summarize  your  data  in  a  chart  with  columns  showing  data  value,  frequency,  relative  frequency, 

and  cumulative  relative  frequency. 
 Answer  the  following,  rounded  to  2  decimal  places: 

1.  Sample  mean  = 

2.  Sample  standard  deviation  = 

3.  First  quartile  = 

4.  Third  quartile  = 

5.  Median  = 

6.  70th  percentile  = 

7.  Value  that  is  2  standard  deviations  above  the  mean  = 

8.  Value  that  is  1.5  standard  deviations  below  the  mean  = 

  Construct  a  histogram  displaying  your  data.  Group  your  data  into  6-10  intervals  of  equal  width. 

Pick  regularly  spaced  intervals  that  make  sense  in  relation  to  your  data.  For  example,  do  NOT  group 
data  by  age  as  20-26,27-33,34-40,41-47,48-54,55-61 .  .  .  Instead,  maybe  use  age  groups  19.5-24.5,  24.5- 
29.5, ...  or  19.5-29.5, 29.5-39.5,  39.5-49.5, .  .  . 

 In  complete  sentences,  describe  the  shape  of  your  histogram. 

  Are  there  any  potential  outliers?  Which  values  are  they?  Show  your  work  and  calculations  as  to 

how  you  used  the  potential  outlier  formula  in  chapter  2  (since  you  are  now  using  univariate  data)  to 
determine  which  values  might  be  outliers. 

  Construct  a  box  plot  of  your  data. 

  Does  the  middle  50%  of  your  data  appear  to  be  concentrated  together  or  spread  out?  Explain  how 

you  determined  this. 

  Looking  at  both  the  histogram  AND  the  box  plot,  discuss  the  distribution  of  your  data.  For  example: 

how  does  the  spread  of  the  middle  50%  of  your  data  compare  to  the  spread  of  the  rest  of  the  data  rep- 
resented in  the  box  plot;  how  does  this  correspond  to  your  description  of  the  shape  of  the  histogram; 
how  does  the  graphical  display  show  any  outliers  you  may  have  found;  does  the  histogram  show  any 
gaps  in  the  data  that  are  not  visible  in  the  box  plot;  are  there  any  interesting  features  of  your  data  that 
you  should  point  out. 

14.4.5.5  Due  Dates 

•  Part  I,  Intro:  (keep  a  copy  for  your  records) 

•  Part  I,  Analysis:  (keep  a  copy  for  youi  records) 

•  Entire  Project,  typed  and  stapled:  

  Cover  sheet:  names,  class  time,  and  name  of  your  study. 

  Part  I:  label  the  sections  "Intro"  and  "Analysis." 

  Part  II: 

  Summary  page  containing  several  paragraphs  written  in  complete  sentences  describing  the  ex- 
periment, including  what  you  studied  and  how  you  collected  your  data.  The  summary  page 
should  also  include  answers  to  ALL  the  questions  asked  above. 

  All  graphs  requested  in  the  project. 

  All  calculations  requested  to  support  questions  in  data. 

  Description:  what  you  learned  by  doing  this  project,  what  challenges  you  had,  how  you  over- 
came the  challenges. 
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14.5  Solution  Sheets 

14.5.1  Solution  Sheet:  Hypothesis  Testing  for  Single  Mean  and  Single  Proportion' 

Class  Time: 
Name: 

a.  H„: 

b.  Ha-. 

c.  In  words,  CLEARLY  state  what  your  random  variable  X  or  P'  represents. 

d.  State  the  distribution  to  use  for  the  test. 

e.  What  is  the  test  statistic? 

f.  What  is  the  p-value?  In  1  -  2  complete  sentences,  explain  what  the  p-value  means  for  this  problem. 

g.  Use  the  previous  information  to  sketch  a  picture  of  this  situation.  CLEARLY,  label  and  scale  the  horizon- 

tal axis  and  shade  the  region(s)  corresponding  to  the  p-value. 


Figure  14.1 


h.  Indicate  the  correct  decision  ("reject"  or  "do  not  reject"  the  null  hypothesis),  the  reason  for  it,  and  write 

an  appropriate  conclusion,  using  complete  sentences. 

i.  Alpha: 

ii.  Decision: 

iii.  Reason  for  decision: 

iv.  Conclusion: 

i.  Construct  a  95%  Confidence  Interval  for  the  true  mean  or  proportion.  Include  a  sketch  of  the  graph  of 

the  situation.  Label  the  point  estimate  and  the  lower  and  upper  bounds  of  the  Confidence  Interval. 


Figure  14.2 


'This  content  is  available  online  at  <http: / /cnx.org/content/ml7134/1.6/>. 
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14.5.2  Solution  Sheet:  Hypothesis  Testing  for  Two  Means,  Paired  Data,  and  Two 
Proportions^" 

Class  Time: 
Name: 

a.  H,:  

b.  H«:  

c.  In  words,  clearly  state  what  your  random  variable  Xi  —  X2,  Pi'  —  P2'-  or  represents. 

d.  State  the  distribution  to  use  for  the  test. 

e.  What  is  the  test  statistic? 

f.  What  is  the  p-value?  In  1  -  2  complete  sentences,  explain  what  the  p-value  means  for  this  problem. 

g.  Use  the  previous  information  to  sketch  a  picture  of  this  situation.  CLEARLY  label  and  scale  the  horizon- 

tal axis  and  shade  the  region(s)  corresponding  to  the  p-value. 


Figure  14.3 


h.  Indicate  the  correct  decision  ("reject"  or  "do  not  reject"  the  null  hjrpothesis),  the  reason  for  it,  and  write 

an  appropriate  conclusion,  using  complete  sentences. 

i.  Alpha: 

ii.  Decision: 

ill.  Reason  for  decision: 
iv.  Conclusion: 

i.  In  complete  sentences,  explain  how  you  determined  which  distribution  to  use. 


^"This  content  is  available  online  at  <http: / /cnx.org/content/ml7133/1.6/>. 
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14.5.3  Solution  Sheet:  The  Chi-Square  Distribution 

Class  Time: 
Name: 

a.  Ho-.  

b.  Ha-.  

c.  What  are  the  degrees  of  freedom? 

d.  State  the  distribution  to  use  for  the  test. 

e.  What  is  the  test  statistic? 

f .  What  is  the  p-value?  In  1  -  2  complete  sentences,  explain  what  the  p-value  means  for  this  problem. 

g.  Use  the  previous  information  to  sketch  a  picture  of  this  situation.  Clearly  label  and  scale  the  horizontal 

axis  and  shade  the  region(s)  corresponding  to  the  p-value. 


Figure  14.4 


h.  Indicate  the  correct  decision  ("reject"  or  "do  not  reject"  the  niiU  hypothesis)  and  write  appropriate  con- 
clusions, using  complete  sentences. 

i.  Alpha: 

ii.  Decision: 

iii.  Reason  for  decision: 

iv.  Conclusion: 


This  content  is  available  online  at  <http:/ / cnx.org/content/ ml7136/1.6/ >. 
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14.5.4  Solution  Sheet:  F  Distribution  and  One-Way  ANOVA 


12 


Class  Time: 


Name: 


a.  Ho-. 

b.  H„: 

c.  df  (n)  = 


df  (d) 


d.  State  the  distribution  to  use  for  the  test. 

e.  What  is  the  test  statistic? 

f .  What  is  the  p-value? 

g.  Use  the  previous  information  to  sketch  a  picture  of  this  situation.  Clearly  label  and  scale  the  horizontal 

axis  and  shade  the  region(s)  corresponding  to  the  p-value. 


h.  Indicate  the  correct  decision  ("reject"  or  "do  not  reject"  the  null  hypothesis)  and  write  appropriate  con- 
clusions, using  complete  sentences. 

i.  Alpha: 

ii.  Decision: 

iii.  Reason  for  decision: 

iv.  Conclusion: 


Figure  14.5 


^^This  content  is  available  online  at  <http://cnx.org/content/ml7135/1.7/>. 
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When  the  English  says: 

Interpret  this  as: 

Xis  at  least  4. 

X  >  4 

The  minimum  of  X  is  4. 

X  >  4 

X  is  no  less  than  4. 

X  >  4 

X  is  greater  than  or  equal  to  4. 

X  >  4 

X  is  at  most  4. 

X  <  4 

The  maximum  of  X  is  4. 

X  <  4 

Xis  no  more  than  4. 

X  <  4 

X  is  less  than  or  equal  to  4. 

X  <  4 

Xdoes  not  exceed  4. 

X  <  4 

Xis  greater  than  4. 

X  >  4 

X  is  more  than  4. 

X  >  4 

Xexceeds  4. 

X  >  4 

Xis  less  than  4. 

X  <  4 

There  are  fewer  X  than  4. 

X  <  4 

Xis  4. 

X  =  4 

Xis  equal  to  4. 

X  =  4 

Xis  the  same  as  4. 

X  =  4 

Xis  not  4. 

X  7^4 

Xis  not  equal  to  4. 

X  7^4 

Xis  not  the  same  as  4. 

X  7^4 

Xis  different  than  4. 

X  7^4 

Table  14.16 


^^This  content  is  available  online  at  <http:/ / cnx.org/content/ ml6307/1.6/ >. 
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14.7  Symbols  and  their  Meanings" 


Symbols  and  their  Meanings 


Chapter  (1st  used) 

Symbol 

Spoken 

Meaning 

Sampling  and  Data 

V 

The  square  root  of 

same 

Sampling  and  Data 

n 

Pi 

3.14159. . .    (a  specific 

number) 

Descriptive  Statistics 

Qi 

Quartile  one 

the  first  quartile 

Descriptive  Statistics 

Q2 

Quartile  two 

the  second  quartile 

Descriptive  Statistics 

Q3 

Quartile  three 

the  third  quartile 

Descriptive  Statistics 

inter-quartile  range 

Q3-Q1=IQR 

Descriptive  Statistics 

X 

x-bar 

sample  mean 

Descriptive  Statistics 

mu 

popiilation  mean 

Descriptive  Statistics 

S  Sx  sx 

s 

sample  standard  devia- 
tion 

Descriptive  Statistics 

2  1 

s-squared 

sample  variance 

Descriptive  Statistics 

(T  (Tx  CX 

sigma 

population  standard 
deviation 

Descriptive  Statistics 

sigma-squared 

population  variance 

Descriptive  Statistics 

V 
2-1 

capital  sigma 

sum 

Probability  Topics 

{} 

brackets 

set  notation 

Probability  Topics 

s 

S 

sample  space 

Probability  Topics 

A 

Event  A 

event  A 

Probability  Topics 

P(A) 

probability  of  A 

probability  of  A  occur- 
ring 

Probability  Topics 

P{A\B) 

probability  of  A  given  B 

prob.    of  A  occurring 
given  B  has  occurred 

Probability  Topics 

P  {AorB) 

prob.  of  A  or  B 

prob.  of  A  or  B  or  both 

occurring 

continued  on  next  page 
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Probability  Topics 

P  {AandB) 

prob.  of  A  and  B 

prob.  of  both  A  and  B 
occurring  (same  time) 

Probability  Topics 

A' 

A-prime,  complement 
of  A 

complement  of  A,  not  A 

Probability  Topics 

P(A') 

prob.  of  complement  of 

A 

same 

Probability  Topics 

Gi 

green  on  first  pick 

same 

Probability  Topics 

P(Gi) 

prob.  of  green  on  first 
pick 

same 

Discrete  Random  Vari- 
ables 

PDF 

prob.  distribution  func- 
tion 

same 

Discrete  Random  Vari- 
ables 

X 

X 

the  random  variable  X 

Discrete  Random  Vari- 
ables 

X  ~ 

the  distribution  of  X 

same 

Discrete  Random  Vari- 
ables 

B 

binomial  distribution 

same 

Discrete  Random  Vari- 
ables 

G 

geometric  distribution 

same 

Discrete  Random  Vari- 
ables 

H 

hj^ergeometric  dist. 

same 

Discrete  Random  Vari- 
ables 

P 

Poisson  dist. 

same 

Discrete  Random  Vari- 
ables 

A 

Lambda 

average  of  Poisson  dis- 
tribution 

Discrete  Random  Vari- 
ables 

> 

greater  than  or  equal  to 

same 

Discrete  Random  Vari- 
ables 

< 

less  than  or  equal  to 

same 

Discrete  Random  Vari- 
ables 

equal  to 

same 

Discrete  Random  Vari- 
ables 

not  equal  to 

same 

continued  on  next  page 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


APPENDIX 


653 


Continuous  Random 
Variables 

f  ofx 

fimction  of  x 

Continuous  Random 
Variables 

pdf 

prob.  density  function 

same 

Continuous  Random 
Variables 

u 

uniform  distribution 

same 

Continuous  Random 
Variables 

Exp 

exponential  distribu- 
tion 

same 

Continuous  Random 
Variables 

k 

k 

critical  value 

Continuous  Random 
Variables 

f  of  X  equals 

same 

Continuous  Random 
Variables 

m 

m 

decay  rate   (for  exp. 
dist.) 

The  Normal  Distribu- 
tion 

N 

normal  distribution 

same 

The  Normal  Distribu- 
tion 

z 

z-score 

same 

The  Normal  Distribu- 
tion 

Z 

standard  normal  dist. 

same 

The  Central  Limit  The- 
orem 

CLT 

Central  Limit  Theorem 

same 

The  Central  Limit  The- 
orem 

X 

X-bar 

the  random  variable  X- 
bar 

The  Central  Limit  The- 
orem 

mean  of  X 

the  average  of  X 

The  Central  Limit  The- 
orem 

Fx 

mean  of  X-bar 

the  average  of  X-bar 

The  Central  Limit  The- 
orem 

standard  deviation  of  X 

same 

The  Central  Limit  The- 
orem 

standard  deviation  of 
X-bar 

same 

The  Central  Limit  The- 
orem 

EX 

sum  ofx 

same 

continued  on  next  page 
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The  Central  Limit  The- 
orem 

sum  of  X 

same 

Confidence  Intervals 

CL 

confidence  level 

same 

Confidence  Intervals 

CI 

confidence  interval 

same 

Confidence  Intervals 

EBM 

error  boimd  for  a  mean 

same 

Confidence  Intervals 

EBP 

error  boimd  for  a  pro- 
T)ortion 

same 

Confidence  Intervals 

t 

student-t  distribution 

same 

CoTifidpTirp  TTitprv;ils 

df 

dppTPPs  of  frppdom 

same 

2 

^tlldpnt-t  WltVl  7\I1.  7\X€^7\ 

in  right  tail 

same 

Confidpnrp  Tntprvals 

A 

v'  P 

n-nrimp*  n-Viat 

samnle   nronortion  of 
success 

Confidence  Intervals 

A 

q'  q 

q-prime;  q-hat 

sample  proportion  of 
failure 

H3^othesis  Testing 

Ho 

H-naught,  H-sub  0 

nuU  h5^othesis 

H5^othesis  Testing 

Ha 

H-a,  H-sub  a 

alternate  hypothesis 

Hj/pothesis  Testing 

Hi 

H-1,  H-sub  1 

alternate  hypothesis 

Hypothesis  Testing 

a. 

alpha 

probability  of  Type  I  er- 
ror 

Hj^othesis  Testing 

iS 

beta 

probability  of  Type  II 
error 

Hypothesis  Testing 

X1-X2 

Xl-bar  minus  X2-bar 

difference   in  sample 
means 

m  -  m 

mu-1  minus  mu-2 

difference   in  popula- 
tion means 

P'l  -  P'2 

Pl-prime    minus  P2- 
prime 

difference   in  sample 
proportions 

Vl  -  P2 

pi  minus  p2 

difference   in  popula- 
tion proportions 

Chi-Square  Distribu- 
tion 

Ky-square 

Chi-square 

continued  on  next  page 
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O 

Observed 

Observed  frequency 

E 

Expected 

Expected  frequency 

Linear  Regression  and 
Correlation 

y  —  a  +  bx 

y  equals  a  plus  b-x 

equation  of  a  line 

A 

y 

y-hat 

estimated  value  of  y 

r 

correlation  coefficient 

same 

e 

error 

same 

SSE 

Sum  of  Squared  Errors 

same 

1.9s 

1.9  times  s 

cut-off  value  for  out- 
liers 

F-Distiibution  and 
ANOVA 

F 

F-ratio 

F  ratio 

Table  14.17 
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14.8  Formulas'' 

Formula  14.1:  Factorial 
n!  =  M  (n  -  1)  (n  -  2) ...  (1) 

0!  =  1 

Formula  14.2:  Combinations 

(n\  _  n\ 
\r)  (n-r)\r\ 

Formula  14.3:  Binomial  Distribution 

X  ~  B  (n,  p) 

P{X^x)^  (^)pV'''for;c  =  0,l,2,...,n 

Formula  14.4:  Geometric  Distribution 

X  ~  G(p) 

P{X  =  x)  =  q^'-^p  ,iorx=  1,2,3,... 

Formula  14.5:  H5^ergeometric  Distribution 
X^H(r,h,n) 


p  (X  =  = 


Formula  14.6:  Poisson  Distribution 

X  ~  P  (^) 

Formula  14.7:  Uniform  Distribution 

X~  !J(a,fc) 

f{X)  =  ^^,a<x<h 

Formula  14.8:  Exponential  Distribution 

X  ~  Exp  (m) 

f  (x)  =  me~™^  ,  m  >  0,x  >  0 

Formula  14.9:  Normal  Distribution 

X~  N(^,(r2) 

f  (x)  =  — j=e    2(^2  —  00  <  X  <  00 

^   '  u\/27r 

Formula  14.10:  Gamma  Function 

r  (m  +  1)  =  m!  for  m,  a  nonnegative  integer 

otherwise:  F  («  + 1)  —  aT  (a) 
Formula  14.11:  Student-t  Distribution 

X  ~  t^f 

^^This  content  is  available  online  at  <http://caTx.org/content/ml6301/1.7/>. 
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-(h+1) 


Z  ~  N  (0, 1)  ,  Y  ~       ,n  =  degrees  of  freedom 
Formula  14.12:  Chi-Square  Distribution 

«/ 

/  (:!c)  =  ^  /  ''^^  ,  X  >  0 ,  n  =  positive  integer  and  degrees  of  freedom 

Formula  14.13:  F  Distribution 

^  ~  Pdf{n)4f{d) 

df  (n)  ^degrees  of  freedom  for  the  niraierator 
df  (d)  ^degrees  of  freedom  for  the  denominator 

fix)  =  r[|)r(|)(^)'^^^"'^  [i  +  (M)^-.5(»+.) 

X  =  ^  ,  Y,  W  are  chi-square 
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14.9  Notes  for  the  TI-83,  83+,  84  Calculator'' 

14.9.1  Quick  Tips 
Legend 

•  ^         -)  represents  a  button  press 


•  [  ]  represents  yellow  command  or  green  letter  behind  a  key 

•  <  >  represents  items  on  the  screen 

To  adjust  the  contrast 

Press   J ,  then  hold  ^^^B  to  increase  the  contrast  or  ^^^B  to  decrease  the  contrast. 

To  capitalize  letters  and  words 

Press  to  get  one  capital  letter,  or  press  -  J ,  then  UUlIi!  to  set  all  button  presses  to  capital 

letters.  You  can  return  to  the  top-level  button  values  by  pressing  UUlU  again. 


To  correct  a  mistake   

If  you  hit  a  wrong  button,  just  hit  UUwl  and  start  again. 


To  write  in  scientific  notation 

Numbers  in  scientific  notation  are  expressed  on  the  TI-83, 83+,  and  84  using  E  notation,  such  that... 

•  4.321  E  4  =  4.321  x  10^ 

•  4.321  E  -4  =  4.321  x  10"^ 

To  transfer  programs  or  equations  from  one  calculator  to  another: 

Both  calculators:  Insert  your  respective  end  of  the  link  cable  cable  and  press  ""j  ^  then  [LINK] . 
Calculator  receiving  information: 

Step  1.  Use  the  arrows  to  navigate  to  and  select  <RECEIVE> 
Step  2.  Press  UiluJ 

Calculator  sending  information: 


Step  1 .  Press  appropriate  number  or  letter. 

Step  2.  Use  up  and  down  arrows  to  access  the  appropriate  item. 


Step  3.  Press  UiluJ  to  select  item  to  transfer. 

Step  4.  Press  right  arrow  to  navigate  to  and  select  <TRANSMIT>. 


ENTER 


Step  5.  Press ' 

NOTE:  ERROR  35  LINK  generally  means  that  the  cables  have  not  been  inserted  far  enough. 

Both  calculators:  Insert  your  respective  end  of  the  link  cable  cable  Both  calculators:  press  ^  then 

[QUIT]  To  exit  when  done. 


^^This  content  is  available  online  at  <http://cnx.Org/content/ml9710/l.7/>. 
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14.9.2  Manipulating  One- Variable  Statistics 

NOTE:  These  directions  are  for  entering  data  with  the  built-in  statistical  program. 

Sample  Data 


Data 

Frequency 

-2 

10 

-1 

3 

0 

4 

1 

5 

3 

8 

Table  14.18:  We  are  manipulating  1-variable  statistics. 

To  begin: 

Step  1.  Turn  on  the  calculator. 


Step  2.  Access  statistics  mode. 


Step  3.  Select  <4 :  ClrList  >  to  clear  data  from  lists,  if  desired. 

^  as 

Step  4.  Enter  list  [LI]  to  be  cleared. 

 J,  [LI]  ,(IM3 

Step  5.  Display  last  instruction. 
^  J ,  [ENTRY] 

Step  6.  Continue  clearing  remaining  lists  in  the  same  fashion,  if  desired. 

,   J ,  [L2]  ,  I 

Step  7.  Access  statistics  mode. 

Step  8.  Select  <1:  Edit  .     .  .> 


ENTER 


ENTER 


Step  9.  Enter  data.  Data  values  go  into  [LI] .  (You  may  need  to  arrow  over  to  [LI] ) 

•  Type  in  a  data  value  and  enter  it.  (For  negative  numbers,  use  the  negate  (-)  key  at  the  bottom  of 
the  keypad) 

(-)  J  .  9  J  mgi 
■ — ^  /  ^ — ^  /  ^^^^^^^ 

•  Continue  in  the  same  manner  until  all  data  values  are  entered. 
Step  10.  In  [L2] ,  enter  the  frequencies  for  each  data  value  in  [LI] . 

•  Type  in  a  frequency  and  enter  it.  (If  a  data  value  appears  only  once,  the  frequency  is  "1") 

•  Continue  in  the  same  manner  until  all  data  values  are  entered. 
Step  11.  Access  statistics  mode. 
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Step  12.  Navigate  to  <CALC> 
Step  13.  Access  <1 : 1-var  Stats> 


ENTER 


Step  14.  Indicate  that  the  data  is  in  [LI] ... 
 J,  [LI] 


Step  15.  ...and  indicate  that  the  frequencies  are  in  [L2]  . 
 J ,  [L2]  ,  (MuJ 

Step  16.  The  statistics  should  be  displayed.  You  may  arrow  down  to  get  remaining  statistics.  Repeat  as  neces- 
sary. 


14.9.3  Drawing  Histograms 

NOTE:  We  will  assume  that  the  data  is  already  entered 

We  will  construct  2  histograms  with  the  built-in  STATPLOT  application.  The  first  way  will  use  the  default 
ZOOM.  The  second  way  will  involve  customizing  a  new  graph. 

Step  1.  Access  graphing  mode. 

-  J ,  [STAT  PLOT] 

Step  2.  Select  <  1 :  plot  1  >  To  access  plotting  -  first  graph. 


ENTER 


Step  3.  Use  the  arrows  navigate  go  to  <0N>  to  turn  on  Plot  1. 


ENTER 


<0N>  ,1 

Step  4.  Use  the  arrows  to  go  to  the  histogram  picture  and  select  the  histogram. 


ENTER 


Step  5.  Use  the  arrows  to  navigate  to  <Xlist  > 
Step  6.  If  "LI"  is  not  selected,  select  it. 

-  J,  [LI]  J 


ENTER 


Step  7.  Use  the  arrows  to  navigate  to  <Freq>. 
Step  8.  Assign  the  frequencies  to  [L2] . 

 J ,  [L2]  ,  I 


ENTER 


Step  9.  Go  back  to  access  other  graphs. 

 J  ,  [STAT  PLOT] 

Step  10.  Use  the  arrows  to  turn  off  the  remaining  plots. 

Step  11.  Be  sure  to  deselect  or  clear  all  equations  before  graphing. 

To  deselect  equations: 

Step  1.  Access  the  list  of  equations. 


Step  2.  Select  each  equal  sign  (=). 


ENTER 


Step  3.  Continue,  until  all  equations  are  deselected. 
To  clear  equations: 
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Step  1.  Access  the  list  of  equations. 

■a 

Step  2.  Use  the  arrow  keys  to  navigate  to  the  right  of  each  equal  sign  (=)  and  clear  them. 


CLEAR 


Step  3.  Repeat  until  all  equations  are  deleted. 
To  draw  default  histogram: 

Step  1.  Access  the  ZOOM  menu. 


ZOOM 


Step  2.  Select  <9 :  ZoomStat> 

Step  3.  The  histogram  will  show  with  a  window  automatically  set. 
To  draw  custom  histogram: 


Step  2. 


Access  Lli 

mkM  to  set  the  graph  parameters. 

•  ^min 

=  -2.5 

•  ^max 

=  3.5 

•    ^scl  = 

=  1  (width  of  bars) 

•  ^mm 

=  0 

•  '^max 

=  10 

•    ^scl  = 

=  1  (spacing  of  tick  marks  on  y-axis) 

•    ^res  - 

=  1 

Step  3.  Access  ViL!iIiUl  to  see  the  histogram. 

To  draw  box  plots: 

Step  1 .  Access  graphing  mode. 
 J ,  [STAT  PLOT] 


Step  2.  Select  <  1 :  Plot  1  >  to  access  the  first  graph. 

Step  3.  Use  the  arrows  to  select  <0N>  and  turn  on  Plot  1. 

Step  4.  Use  the  arrows  to  select  the  box  plot  picture  and  enable  it. 


Step  5.  Use  the  arrows  to  navigate  to  <Xlist  > 
Step  6.  If  "LI"  is  not  selected,  select  it. 

 J,  [LI]  J 


ENTER 


Step  7.  Use  the  arrows  to  navigate  to  <Freq>. 
Step  8.  Indicate  that  the  frequencies  are  in  [L2]  . 

 J ,  [L2]  ,  I 


ENTER 


Step  9.  Go  back  to  access  other  graphs. 
 J ,  [STAT  PLOT] 

Step  10.  Be  sure  to  deselect  or  clear  all  equations  before  graphing  using  the  method  mentioned  above. 
Step  11.  View  the  box  plot. 


GRAPH 


,  [STAT  PLOT] 
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14.9.4  Linear  Regression 
14.9.4.1  Sample  Data 

The  following  data  is  real.  The  percent  of  declared  ethnic  minority  students  at  De  Anza  College  for  selected 
years  from  1970  - 1995  was: 


Year 

Student  Ethnic  Minority  Percentage 

1970 

14.13 

1973 

12.27 

1976 

14.08 

1979 

18.16 

1982 

27.64 

1983 

28.72 

1986 

31.86 

1989 

33.14 

1992 

45.37 

1995 

53.1 

Table  14.19:  The  independent  variable  is  "Year,"  while  the  independent  variable  is  "Student  Ethnic  Minority 

Percent." 


Student  Ethnic  Minority  Percentage 


60 
50 
40 
<f  30 
I  20 
10 
0 


stud.  Ethnic  Min.  Perc. 


+ 


+ 


1960       1970       1980       1990  2000 

Year 


Figure  14.6:  By  hand,  verify  the  scatterplot  above. 
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NOTE:  The  TI-83  has  a  built-in  linear  regression  feature,  which  allows  the  data  to  be  edited.The 
x-values  will  be  in  [LI] ;  the  y-values  in  [L2]  . 

To  enter  data  and  do  linear  regression: 

Step  1.  ON  Turns  calculator  on 


Step  2.  Before  accessing  this  program,  be  sure  to  turn  off  all  plots. 

•  Access  graphing  mode. 

^^M,  [STAT  PLOT] 

•  Turn  off  all  plots. 

Step  3.  Round  to  3  decimal  places.  To  do  so: 

•  Access  the  mode  menu. 

UiillU,  [STAT  PLOT] 

•  Navigate  to  <Float>  and  then  to  the  right  to  <3>. 


All  numbers  will  be  rounded  to  3  decimal  places  until  changed. 


ENTER 


STAT 


STAT  HENTER 


Step  4.  Enter  statistics  mode  and  clear  lists  [LI]  and  [L2] ,  as  describe  above. 
Step  5.  Enter  editing  mode  to  insert  values  for  x  and  y. 


Step  6.  Enter  each  value.  Press  UilUtf  to  continue. 

To  display  the  correlation  coefficient: 

Step  1.  Access  the  catalog. 


J ,  [CATALOG] 


Step  2.  Arrow  down  and  select  <DiagnosticOn> 


M 

ENTER 

1 

ENTER 

Step  3.  r  and  will  be  displayed  during  regression  calculations. 
Step  4.  Access  linear  regression. 


Step  5.  Select  the  form  oiy  =  a  +  bx 


ENTER 


The  display  will  show: 
LinReg 

•  y  =  a  +  bx 

•  a  =  -3176.909 

•  b^  1.617 

•  r2  ^  0.924 

•  r  =  0.961 
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This  means  the  Line  of  Best  Fit  (Least  Squares  Line)  is: 

•  y  =  -3176.909  +  1.617x 

•  Percent  =  -3176.909  +  1.617{year  #) 

The  correlation  coefficient  r  =  0.961 
To  see  the  scatter  plot: 

Step  1.  Access  graphing  mode. 

 J ,  [STAT  PLOT] 

Step  2.  Select  <  1 :  plot  1  >  To  access  plotting  -  first  graph. 


ENTER 


ENTER 


Step  3.  Navigate  and  select  <0N>  to  turn  on  Plot  1. 


<0N> 

Step  4.  Navigate  to  the  first  picture. 
Step  5.  Select  the  scatter  plot. 


ENTER 


Step  6.  Navigate  to  <Xlist> 

Step  7.  If  [LI]  is  not  selected,  press   J ,  [LI]  to  select  it. 

Step  8.  Confirm  that  the  data  values  are  in  [LI] . 


ENTER 


<0N>  I 

Step  9.  Navigate  to  <Ylist> 
Step  10.  Select  that  the  frequencies  are  in  [L2] . 

-  J ,  [L2]  ,  I 


ENTER 


Step  11.  Go  back  to  access  other  graphs. 

.  J ,  [STAT  PLOT] 

Step  12.  Use  the  arrows  to  turn  off  the  remaining  plots. 


Access  Lli 

diiiiJ  to  set  the  graph  parameters. 

• 

=  1970 

• 

^max 

=  2000 

• 

^scl  = 

=  10  (spacing  of  tick  marks  on  x-axis) 

• 

Y 

^  mm 

=  -0.05 

• 

^max 

=  60 

• 

ysci  = 

=  10  (spacing  of  tick  marks  on  y-axis) 

• 

^res  - 

=  1 

Step  14.  Be  sure  to  deselect  or  clear  all  equations  before  graphing,  using  the  instructions  above. 
Step  15.  Press  liLuiJ  to  see  the  scatter  plot. 

To  see  the  regression  graph: 

Step  1.  Access  the  equation  menu.  The  regression  equation  will  be  put  into  Yl. 


Step  2.  Access  the  vars  menu  and  navigate  to  <5 :  Statistics> 
Step  3.  Navigate  to  <EQ>. 

Step  4.  <  1 :    RegEQ  >  contains  the  regression  equation  which  will  be  entered  in  Yl. 


ENTER 
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Step  5.  Press  CLUAll .  The  regression  line  will  be  superimposed  over  scatter  plot. 

To  see  the  residuals  and  use  them  to  calculate  the  critical  point  for  an  outlier: 

Step  1.  Access  the  list.  RESID  will  be  an  item  on  the  menu.  Navigate  to  it. 


J,  [LIST],  <RESID> 


Step  2.  Confirm  twice  to  view  the  list  of  residuals.  Use  the  arrows  to  select  them. 


ENTERHENTER 


Step  3.  The  critical  point  for  an  outlier  is:  1.9^ where: 

•  n  =  number  of  pairs  of  data 

•  SSE  =  sum  of  the  squared  errors 

•  residual 

Step  4.  Store  the  residuals  in  [L3] . 

J ,  [L3]  ,  \SBm 

.  Note  that  n  -  2  =  8 


STO^ 


Step  5.  Calculate  the 


•  J ,  [L3]  , 

Step  6.  Store  this  value  in  [L4]  . 

 J,  [L4]  , 


STO^ 


ENTER 


Step  7.  Calculate  the  critical  value  using  the  equation  above. 


.   9  J 

-  s  y  f 

[  X 

) 

) 

i 

J  ,  [V]  ,  .  J ,  [LIST] 


J,  ) 


,  [L4]  ,  I 

Step  8.  Verify  that  the  calculator  displays:  7.642669563.  This  is  the  critical  value. 

Step  9.  Compare  the  absolute  value  of  each  residual  value  in  [L3]  to  7.64  .  If  the  absolute  value  is  greater 
than  7.64,  then  the  (x,  y)  corresponding  point  is  an  outlier.  In  this  case,  none  of  the  points  is  an  outlier. 

To  obtain  estimates  of  y  for  various  x-values: 

There  are  various  ways  to  determine  estimates  for  "y".  One  way  is  to  substitute  values  for  "x"  in  the 


equation.  Another  way  is  to  use  the 


TRACE 


on  the  graph  of  the  regression  line. 


14.9.5  TI-83,  83+,  84  instructions  for  distributions  and  tests 
14.9.5.1  Distributions 

Access  DISTR  (for  "Distributions"). 

For  technical  assistance,  visit  the  Texas  Instruments  website  at  http:  / /www.ti.com^''  and  enter  your  calcu- 
lator model  into  the  "search"  box. 

Binomial  Distribution 

•  binompdf  (n ,  p ,  x)  corresponds  to  P(X  =  x) 

•  binomcdf  (n,p,x)  corresponds  to  P(X  <  x) 

•  To  see  a  list  of  all  probabilities  for  x:  0, 1, .  .  . ,  n,  leave  off  the  "x"  parameter. 

Poisson  Distribution 

•  poissonpdf  (A,x)  corresponds  to  P(X  =  x) 

•  poissoncdf  (A,x)  corresponds  to  P(X  <  x) 


^  http:/ /www.  ti.com 
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Continuous  Distributions  (general) 

•  —00  uses  the  value  -1EE99  for  left  boimd 

•  00  uses  the  value  1EE99  for  right  bound 

Normal  Distribution 

•  normalpdf  (x ,  ^ ,  (/)  yields  a  probability  density  function  value  (only  useful  to  plot  the  normal  curve, 
in  which  case  "x"  is  the  variable) 

•  normalcdf  (left  bound,  right  bound,  ^ ,  (r)  corresponds  to  P(left  bound  <  X  <  right  bound) 

•  normalcdf  (left  bound,  right  bound)  corresponds  to  P(left  boimd  <  Z  <  right  boimd)  -  standard 
normal 

•  invNorm  (p ,  f/ ,  cj)  yields  the  critical  value,  k:  P(X  <  k)  =  p 

•  invNorm(p)  yields  the  critical  value,  k:  P(Z  <  k)  =  p  for  the  standard  normal 

Student-t  Distribution 

•  tpdf  (x ,  df )  yields  the  probability  density  function  value  (only  useful  to  plot  the  student-t  curve,  in 
which  case  "x"  is  the  variable) 

•  tcdfdeft  bound,  right  bound,  df)  corresponds  to  P(left  bound  <  t  <  right  bound) 

Chi-square  Distribution 

•  X^pdf  (x ,  df )  yields  the  probability  density  fimction  value  (only  usefiil  to  plot  the  chi-^  ciirve,  in  which 
case  "x"  is  the  variable) 

•  X^cdfdeft  bound,  right  bound,  df )  corresponds  to  P(left  boimd  <  X'^  <  right  boimd) 
F  Distribution 

•  Fpdf  (x ,  df  num ,  df  denom)  yields  the  probability  density  fimction  value  (only  useful  to  plot  the  F  curve, 
in  which  case  "x"  is  the  variable) 

•  Fcdfdeft  bound, right  bound,df  num,  df  denom)  corresponds  to  P(left  boxmd  <  F  <  right  boimd) 

14.9.5.2  Tests  and  Confidence  Intervals 

Access  STAT  and  TESTS. 

For  the  Confidence  Intervals  and  H3^othesis  Tests,  you  may  enter  the  data  into  the  appropriate  lists  and 
press  DATA  to  have  the  calculator  find  the  sample  means  and  standard  deviations.  Or,  you  may  enter  the 
sample  means  and  sample  standard  deviations  directly  by  pressing  STAT  once  in  the  appropriate  tests. 

Confidence  Intervals 

•  ZInterval  is  the  confidence  interval  for  mean  when  cr  is  known 

•  TInterval  is  the  confidence  interval  for  mean  when  cr  is  unknown;  s  estimates  cr. 

•  1-PropZInt  is  the  confidence  interval  for  proportion 

NOTE:  The  confidence  levels  shoiild  be  given  as  percents  (ex.  enter  "95"  or  " .  95"  for  a  95%  confi- 
dence level). 

Hypothesis  Tests 

•  Z-Test  is  the  hypothesis  test  for  single  mean  when  cr  is  known 

•  T-Test  is  the  hjrpothesis  test  for  single  mean  when  a  is  unknown;  s  estimates  cr. 

•  2-SampZTest  is  the  hypothesis  test  for  2  independent  means  when  both  cr's  are  known 
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•  2-SEiinpTTest  is  the  hypothesis  test  for  2  independent  means  when  both  cr's  are  unknown 

•  1-PropZTest  is  the  hypothesis  test  for  single  proportion. 

•  2-PropZTest  is  the  hypothesis  test  for  2  proportions. 

•  X'^-Test  is  the  hypothesis  test  for  independence. 

•  X^GOF-Test  is  the  hj^jothesis  test  for  goodness-of-fit  (TI-84+  only). 

•  LinRegTTEST  is  the  hypothesis  test  for  Linear  Regression  (TI-84+  only). 

NOTE:  Input  the  null  hypothesis  value  in  the  row  below  "Inpt."  For  a  test  of  a  single  mean,  "fi0" 
represents  the  null  hj^othesis.  For  a  test  of  a  single  proportion,  "p0"  represents  the  null  hj^othe- 
sis.  Enter  the  alternate  hypothesis  on  the  bottom  row. 
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Solutions  to  Exercises  in  Chapter  14 

Solutions  to  Practice  Final  Exam  1 

Solution  to  Exercise  14.1.1  (p.  613) 

B:  Independent. 

Solution  to  Exercise  14.1.2  (p.  613) 

^-  16 

Solution  to  Exercise  14.1.3  (p.  613) 

B:  Two  measurements  are  drawn  from  the  same  pair  of  individuals  or  objects. 

Solution  to  Exercise  14.1.4  (p.  614) 

B- 

Solution  to  Exercise  14.1.5  (p.  614) 

^-  52 

Solution  to  Exercise  14.1.6  (p.  614) 

B-  ^ 

40 

Solution  to  Exercise  14.1.7  (p.  614) 

B:  2.78 

Solution  to  Exercise  14.1.8  (p.  615) 

A:  8.25 

Solution  to  Exercise  14.1.9  (p.  615) 

C:  0.2870 

Solution  to  Exercise  14.1.10  (p.  615) 

C:  Normal 

Solution  to  Exercise  14.1.11  (p.  615) 

D:  Ha-.  Pa  ^  Pb 

Solution  to  Exercise  14.1.12  (p.  615) 

B:  conclude  that  the  pass  rate  for  Math  lA  is  different  than  the  pass  rate  for  Math  IB  when,  in  fact,  the  pass 
rates  are  the  same. 


Solution  to  Exercise 

14.1.13 

(P- 

616) 

B:  not  reject  Ho 

Solution  to  Exercise 

14.1.14 

(P- 

616) 

C:  Iris 

Solution  to  Exercise 

14.1.15 

(P- 

616) 

C:  Student's-t 

Solution  to  Exercise 

14.1.16 

(P- 

617) 

B:  is  left-tailed 

Solution  to  Exercise 

14.1.17 

(P- 

617) 

C:  cluster  sampling 

Solution  to  Exercise 

14.1.18 

(P- 

617) 

B:  Median 

Solution  to  Exercise 

14.1.19 

(P- 

617) 

A:  the  probability  that  an  outcome  of  the  data  wiU  happen  purely  by  chance  when  the  null  hypothesis  is 
true. 

Solution  to  Exercise  14.1.20  (p.  618) 

D:  stratified 

Solution  to  Exercise  14.1.21  (p.  618) 

B:25 

Solution  to  Exercise  14.1.22  (p.  618) 

C:4 

Solution  to  Exercise  14.1.23  (p.  618) 
A:  (1.85, 2.32) 
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Solution  to  Exercise  14.1.24  (p.  618) 
C:  Both  above  are  correct. 
Solution  to  Exercise  14.1.25  (p.  619) 

C:5.8 

Solution  to  Exercise  14.1.26  (p.  619) 

C:  0.6321 

Solution  to  Exercise  14.1.27  (p.  619) 

A:  0.8413 

Solution  to  Exercise  14.1.28  (p.  619) 

A:  (0.6030,  0.7954) 

Solution  to  Exercise  14.1.29  (p.  619) 


Solution  to  Exercise 

14.1.30 

(P- 

620) 

D:  3.66 

Solution  to  Exercise 

14.1.31 

(P- 

620) 

B:  5.1 

Solution  to  Exercise 

14.1.32 

(P- 

620) 

A:  13.46 

Solution  to  Exercise 

14.1.33 

(P- 

620) 

B:  There  is  a  strong  linear  pattern.  Therefore,  it  is  most  likely  a  good  model  to  be  used. 
Solution  to  Exercise  14.1.34  (p.  621) 

B:  Chfs 

Solution  to  Exercise  14.1.35  (p.  621) 

D:70 

Solution  to  Exercise  14.1.36  (p.  621) 

B:  There  is  sufficient  evidence  to  conclude  that  the  choice  of  major  and  the  gender  of  the  student  are  not 
independent  of  each  other. 
Solution  to  Exercise  14.1.37  (p.  621) 

A:  Chf  goodness  of  fit 

Solutions  to  Practice  Final  Exam  2 

Solution  to  Exercise  14.2.1  (p.  622) 

B:  parameter 

Solution  to  Exercise  14.2.2  (p.  622) 

A 

Solution  to  Exercise  14.2.3  (p.  623) 

C:7 

Solution  to  Exercise  14.2.4  (p.  623) 

C:  0.02 

Solution  to  Exercise  14.2.5  (p.  623) 

C:  none  of  the  above 

Solution  to  Exercise  14.2.6  (p.  623) 

^-  140 

Solution  to  Exercise  14.2.7  (p.  624) 

A:  wO 

Solution  to  Exercise  14.2.8  (p.  624) 
B:  The  values  for  x  are:  {1, 2, 3, 14} 
Solution  to  Exercise  14.2.9  (p.  624) 

C:  0.9417 
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Solution  to  Exercise  14.2.10  (p.  624) 

D:  Binomial 

Solution  to  Exercise  14.2.11  (p.  625) 

D:8.7 

Solution  to  Exercise  14.2.12  (p.  625) 

A:  -1.96 

Solution  to  Exercise  14.2.13  (p.  625) 

A:  0.6321 

Solution  to  Exercise  14.2.14  (p.  626) 

D:  360 

Solution  to  Exercise  14.2.15  (p.  626) 


Solution  to 

Exercise 

14.2.17 

(P- 

626) 

D 

Solution  to 

Exercise 

14.2.18 

(P- 

627) 

B:5.5 

Solution  to 

Exercise 

14.2.19 

(P- 

627) 

D:  6.92 

Solution  to 

Exercise 

14.2.20 

(P- 

627) 

A:  5 

Solution  to 

Exercise 

14.2.21 

(P- 

627) 

B:  0.8541 

Solution  to 

Exercise 

14.2.22 

(P- 

628) 

B:0.2 

Solution  to 

Exercise 

14.2.23 

(P- 

628) 

A:  -1 

Solution  to 

Exercise 

14.2.24 

(P- 

628) 

C:  matched  pairs,  dependent  groups 
Solution  to  Exercise  14.2.25  (p.  628) 

D:  Reject  Hg.  There  is  sufficient  evidence  to  conclude  that  there  is  a  difference  in  the  mean  scores. 
Solution  to  Exercise  14.2.26  (p.  629) 

C:  There  is  sufficient  evidence  to  conclude  that  the  proportion  for  males  is  higher  than  the  proportion  for 
females. 

Solution  to  Exercise  14.2.27  (p.  629) 

B:No 

Solution  to  Exercise  14.2.28  (p.  629) 

B:  p-value  is  close  to  1. 

Solution  to  Exercise  14.2.29  (p.  629) 

B:No 

Solution  to  Exercise  14.2.30  (p.  630) 

C:  y  =  79.96X  -  0.0094 
Solution  to  Exercise  14.2.31  (p.  630) 
D:  We  should  not  be  estimating  here. 
Solution  to  Exercise  14.2.32  (p.  630) 

A 

Solution  to  Exercise  14.2.33  (p.  630) 

A:  The  p-value  is  >  0.10.  There  is  insufficient  information  to  conclude  that  the  distribution  is  not  imiform. 
Solution  to  Exercise  14.2.34  (p.  631) 

C:  The  test  is  to  determine  if  the  different  groups  have  the  same  means. 


Solution 


to 


Exercise  14.2.16  (p.  626) 
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NOTE:  When  you  are  finished  with  the  table  link,  use  the  back  button  on  your  browser  to  return 
here. 

Tables  (NIST/SEMATECH  e-Handbook  of  Statistical  Methods,  http://www.itl.nist.gov/div898/handbook/, 
January  3,  2009) 

•  Student-t  table^ 

•  Normal  table'' 

•  Chi-Square  table'* 

•  F-table^ 

•  All  four  tables  can  be  accessed  by  going  to  http:/  / www.itl.nist.gov/ div898/handbook/ eda/ sections/ eda367.htm^ 

95%  Critical  Values  of  the  Sample  Correlation  Coefficient  Table 

•  95%  Critical  Values  of  the  Sample  Correlation  Coefficient'' 

NOTE:  The  url  for  this  table  is  http://cnx.org/content/ml7098/latest/ 


^This  content  is  available  online  at  <http://cnx.org/content/ml9138/1.3/>. 

^http://www.itl.nist.gov/div898/handbook/eda/section3/eda3672.htm 

^http://www.itl.nist.gov/div898/handbook/eda/section3/eda3671.htm 

*http://www.itl.nist.gov/div898/handbook/eda/section3/eda3674.htm 

5http://www.itl.nist.gov/div898/handbook/eda/section3/eda3673.htm 

*http://www.itl.nist.gov/div898/handbook/eda/section3/eda367.htin 

^http://cnx.org/content/ml7098/latest/ 
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A  Analysis  of  Variance 

Also  referred  to  as  ANOVA.  A  method  of  testing  whether  or  not  the  means  of  three  or  more 
popiilations  are  equal.  The  method  is  applicable  if: 

•  All  populations  of  interest  are  normally  distributed. 

•  The  populations  have  equal  standard  deviations. 

•  Samples  (not  necessarily  of  the  same  size)  are  randomly  and  independently  selected  from 
each  population. 

The  test  statistic  for  analysis  of  variance  is  the  F-ratio. 

Average 

A  number  that  describes  the  central  tendency  of  the  data.  There  are  a  number  of  specialized 
averages,  including  the  arithmetic  mean,  weighted  mean,  median,  mode,  and  geometric  mean. 

B   Bernoulli  Trials 

An  experiment  with  the  following  characteristics: 

•  There  are  only  2  possible  outcomes  called  "success"  and  "failure"  for  each  trial. 

•  The  probability  p  of  a  success  is  the  same  for  any  trial  (so  the  probability  ^  =  1  —  p  of  a 
failure  is  the  same  for  any  trial). 

Binomial  Distribution 

A  discrete  random  variable  (RV)  which  arises  from  Bernoulli  trials.  There  are  a  fixed  number,  n, 
of  independent  trials.  "Independent"  means  that  the  result  of  any  trial  (for  example,  trial  1) 
does  not  affect  the  results  of  the  following  trials,  and  all  trials  are  conducted  under  the  same 
conditions.  Under  these  circumstances  the  binomial  RV  X  is  defined  as  the  number  of  successes 
in  n  trials.  The  notation  is:  X~B  (n,  p).  The  mean  is  p  =  «p  and  the  standard  deviation  is 
a  —  y/npq.  The  probability  of  exactly  x  successes  in  n  trials  isP  {X  —  x)  —  (")  p^q^~^. 

C   Central  Limit  Theorem 

Given  a  random  variable  (RV)  with  known  mean  fi  and  known  standard  deviation  a.  We  are 
sampling  with  size  n  and  we  are  interested  in  two  new  RVs  -  the  sample  mean,  X,  and  the 

sample  sum,  EX.  If  the  size  n  of  the  sample  is  sufficiently  large,  then  X~  N  (ji,        and  LX  ~ 

N  {nji,  \fn(T) .  If  the  size  n  of  the  sample  is  sufficiently  large,  then  the  distribution  of  the  sample 
means  and  the  distribution  of  the  sample  sums  will  approximate  a  normal  distribution 
regardless  of  the  shape  of  the  population.  The  mean  of  the  sample  means  will  equal  the 
population  mean  and  the  mean  of  the  sample  sums  will  equal  n  times  the  population  mean. 
The  standard  deviation  of  the  distribution  of  the  sample  means,       is  called  the  standard  error 
of  the  mean. 

Coefficient  of  Correlation 
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A  measure  developed  by  Karl  Pearson  (early  1900s)  that  gives  the  strength  of  association 
between  the  independent  variable  and  the  dependent  variable.  The  formula  is: 

nExy- (E^)  (Ey)  ^^2  7) 


where  n  is  the  number  of  data  points.  The  coefficient  cannot  be  more  then  1  and  less  then  -1. 
The  closer  the  coefficient  is  to  ±1,  the  stronger  the  evidence  of  a  significant  linear  relationship 
between  x  and  y. 

Conditional  Probability 

The  likelihood  that  an  event  wiU  occur  given  that  another  event  has  already  occurred. 
Confidence  Interval  (CI) 

An  interval  estimate  for  an  unknown  population  parameter.  This  depends  on: 

•  The  desired  confidence  level. 

•  Information  that  is  known  about  the  distribution  (for  example,  known  standard  deviation). 

•  The  sample  and  its  size. 

Confidence  Level  (CL) 

The  percent  expression  for  the  probability  that  the  confidence  interval  contains  the  true 
population  parameter.  For  example,  if  the  CL  —  90%,  then  in  90  out  of  100  samples  the  interval 
estimate  will  enclose  the  true  population  parameter. 

Contingency  Table 

The  method  of  displaying  a  frequency  distribution  as  a  table  with  rows  and  columns  to  show 
how  two  variables  may  be  dependent  (contingent)  upon  each  other.  The  table  provides  an  easy 
way  to  calculate  conditional  probabilities. 

Continuous  Random  Variable 

A  random  variable  (RV)  whose  outcomes  are  measured. 

Example:  The  height  of  trees  in  the  forest  is  a  continuous  RV. 

Cumulative  Relative  Frequency 

The  term  applies  to  an  ordered  set  of  observations  from  smallest  to  largest.  The  Cumulative 
Relative  Frequency  is  the  sum  of  the  relative  frequencies  for  all  values  that  are  less  than  or  equal 
to  the  given  value. 


,2  _  /'V"i,\2 


D  Data 


A  set  of  observations  (a  set  of  possible  outcomes).  Most  data  can  be  put  into  two  groups: 
qualitative  (hair  color,  ethnic  groups  and  other  attributes  of  the  population)  and  quantitative 
(distance  traveled  to  college,  number  of  children  in  a  family,  etc.).  Quantitative  data  can  be 
separated  into  two  subgroups:  discrete  and  continuous.  Data  is  discrete  if  it  is  the  result  of 
counting  (the  number  of  students  of  a  given  ethnic  group  in  a  class,  the  number  of  books  on  a 
shelf,  etc.).  Data  is  continuous  if  it  is  the  result  of  measuring  (distance  traveled,  weight  of 
luggage,  etc.) 

Degrees  of  Freedom  (df) 

The  niraiber  of  objects  in  a  sample  that  are  free  to  vary. 
Discrete  Random  Variable 
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A  random  variable  (RV)  whose  outcomes  are  counted. 

E   Equally  Likely 

Each  outcome  of  an  experiment  has  the  same  probability. 

Error  Bound  for  a  Population  Mean  (EBM) 

The  margin  of  error.  Depends  on  the  confidence  level,  sample  size,  and  known  or  estimated 
population  standard  deviation. 

Error  Bound  for  a  Population  Proportion(EBP) 

The  margin  of  error.  Depends  on  the  confidence  level,  sample  size,  and  the  estimated  (from  the 
sample)  proportion  of  successes. 

Event 

A  subset  in  the  set  of  all  outcomes  of  an  experiment.  The  set  of  all  outcomes  of  an  experiment  is 
called  a  sample  space  and  denoted  usually  by  S.  An  event  is  any  arbitrary  subset  in  S.  It  can 
contain  one  outcome,  two  outcomes,  no  outcomes  (empty  subset),  the  entire  sample  space,  etc. 
Standard  notations  for  events  are  capital  letters  such  as  A,  B,  C,  etc. 

Expected  Value 

Expected  arithmetic  average  when  an  experiment  is  repeated  many  times.  (Also  called  the 
mean).  Notations:  E  (x) ,  }i.  For  a  discrete  random  variable  (RV)  with  probability  distribution 
function  P  (x)  ,the  definition  can  also  be  written  in  the  form  E  {%)  —  }i  —  Y^xP  (x) . 

Experiment 

A  planned  activity  carried  out  under  controlled  conditions. 

Exponential  Distribution 

A  continuous  random  variable  (RV)  that  appears  when  we  are  interested  in  the  intervals  of  time 
between  some  random  events,  for  example,  the  length  of  time  between  emergency  arrivals  at  a 
hospital.  Notation:  X~Exp  (m).  The  mean  is  ^  —  ^  and  the  standard  deviation  is  tr  =  i.  The 
probability  density  function  is  /  (x)  —  me""™,  x  >  0  and  the  cumulative  distribution  function 
is  P  (X  <  x)  =  1  -  e""^. 

F  Frequency 

The  number  of  times  a  value  of  the  data  occurs. 

G  Geometric  Distribution 

A  discrete  random  variable  (RV)  which  arises  from  the  Bernoulli  trials.  The  trials  are  repeated 
until  the  first  success.  The  geometric  variable  X  is  defined  as  the  number  of  trials  until  the  first 
success.  Notation:  X~  G  (p).  The  mean  is  ^  =  i  and  the  standard  deviation  is 


formula:  P  (X  =  x)  =  p  (1  -  pY  \ 

H  Hypergeometric  Distribution 

A  discrete  random  variable  (RV)  that  is  characterized  by 

•  A  fixed  number  of  trials. 

•  The  probability  of  success  is  not  the  same  from  trial  to  trial. 


i  —  1  j  The  probability  of  exactly  x  failiires  before  the  first 


success  is  given  by  the 


Available  for  free  at  Connexions  <http://cnx.Org/content/coI10522/l.40> 


GLOSSARY 


675 


We  sample  from  two  groups  of  items  when  we  are  interested  in  only  one  group.  X  is  defined  as 
the  number  of  successes  out  of  the  total  number  of  items  chosen.  Notation:  X~H  (r,  b,  n) ., 
where  r  =  the  number  of  items  in  the  group  of  interest,  b  =  the  number  of  items  in  the  group  not 
of  interest,  and  n  =  the  number  of  items  chosen. 

Hypothesis 

A  statement  about  the  value  of  a  population  parameter.  In  case  of  two  hypotheses,  the  statement 
assumed  to  be  true  is  called  the  null  h5^othesis  (notation  Hq)  and  the  contradictory  statement  is 
called  the  alternate  hypothesis  (notation  Ha). 

Hypothesis  Testing 

Based  on  sample  evidence,  a  procedure  to  determine  whether  the  h3^othesis  stated  is  a 
reasonable  statement  and  cannot  be  rejected,  or  is  unreasonable  and  should  be  rejected. 

I    Independent  Events 

The  occurrence  of  one  event  has  no  effect  on  the  probability  of  the  occurrence  of  any  other  event. 
Events  A  and  B  are  independent  if  one  of  the  following  is  true:  (1).  P  {A\B)  —  P  {A) ;  (2) 
P  {B\A)  =P{B);  (3)  P  {AandB)  =  P  {A)  P  (B). 

Inferential  Statistics 

Also  called  statistical  inference  or  inductive  statistics.  This  facet  of  statistics  deals  with 
estimating  a  population  parameter  based  on  a  sample  statistic.  For  example,  if  4  out  of  the  100 
calculators  sampled  are  defective  we  might  infer  that  4  percent  of  the  production  is  defective. 

Interquartile  Range  (IRQ) 

The  distance  between  the  third  quartile  (Q3)  and  the  first  quartile  (Ql).  IQR  =  Q3  -  Ql. 

L    Level  of  Significance  of  the  Test 

Probability  of  a  Type  1  error  (reject  the  null  h5^othesis  when  it  is  true).  Notation:  a.  In 
h3rpothesis  testing,  the  Level  of  Significance  is  called  the  preconceived  a  or  the  preset  a. 

M  Mean 

A  number  that  measures  the  central  tendency.  A  common  name  for  mean  is  'average.'  The  term 
'mean'  is  a  shortened  form  of  'arithmetic  mean.'  By  definition,  the  mean  for  a  sample  (denoted 

1.    — s  .    —       Sum  of  all  values  in  the  sample        j  .  i.  r  i  u       /j       i.  j  i.      \  • 

by  x)isx=  Number  of  values  in  the  sample-  ^nd  the  mean  for  a  population  (denoted  by  /i)  is 

  Sum  of  all  values  in  the  population 

^      Number  of  values  in  the  poptilation ' 

Median 

A  number  that  separates  ordered  data  into  halves.  Half  the  values  are  the  same  number  or 
smaller  than  the  median  and  half  the  values  are  the  same  number  or  larger  than  the  median. 
The  median  may  or  may  not  be  part  of  the  data. 

Mode 

The  value  that  appears  most  frequently  in  a  set  of  data. 
Mutually  Exclusive 

An  observation  cannot  fall  into  more  than  one  class  (category).  Being  in  more  than  one  category 
prevents  being  in  a  mutually  exclusive  category. 

N  Normal  Distribution 
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A  continuous  random  variable  (RV)  with  pdf  f(x)  =  ^J— e       f  ^^/2a^,  where  }i  is  the  mean  of 

the  distribution  and  cr  is  the  standard  deviation.  Notation:  X  ^  N  {ji,  a) .  U  =  0  and  (7  =  1,  the 
RV  is  called  the  standard  normal  distribution. 


O  Outcome  (observation) 

A  particiilar  resiilt  of  an  experiment. 
Outlier 

An  observation  that  does  not  fit  the  rest  of  the  data. 


P  p-value 

The  probability  that  an  event  will  happen  purely  by  chance  assiraiing  the  null  hypothesis  is  true. 
The  smaller  the  p-value,  the  stronger  the  evidence  is  against  the  null  hypothesis. 

Parameter 

A  numerical  characteristic  of  the  population. 
Percentile 

A  number  that  divides  ordered  data  into  hundredths. 

Example:  Let  a  data  set  contain  200  ordered  observations  starting  with  {2.3, 2.7, 2.8, 2.9, 2.9, 3.0...}. 
Then  the  first  percentile  is  (^■^+^-^)  —  2.75,  because  1%  of  the  data  is  to  the  left  of  this  point  on 

(1  9-1-2  9^ 

the  number  line  and  99%  of  the  data  is  on  its  right.  The  second  percentile  is      2  ~ 
Percentiles  may  or  may  not  be  part  of  the  data.  In  this  example,  the  first  percentile  is  not  in  the 
data,  but  the  second  percentile  is.  The  median  of  the  data  is  the  second  quartile  and  the  50th 
percentile.  The  first  and  third  quartiles  are  the  25th  and  the  75th  percentiles,  respectively. 

Point  Estimate 

A  single  number  computed  from  a  sample  and  used  to  estimate  a  population  parameter. 
Poisson  Distribution 

A  discrete  random  variable  (RV)  that  counts  the  niraiber  of  times  a  certain  event  will  occur  in  a 
specific  interval.  Characteristics  of  the  variable: 

•  The  probability  that  the  event  occurs  in  a  given  interval  is  the  same  for  all  intervals. 

•  The  events  occiir  with  a  known  mean  and  independently  of  the  time  since  the  last  event. 

The  distribution  is  defined  by  the  mean  fi  of  the  event  in  the  interval.  Notation:  X~P  (ji).  The 
mean  isfi  =  np.  The  standard  deviation  is  cr  =        The  probability  of  having  exactly  x 

successes  in  r  trials  is  P  (X  =  x)  =  The  Poisson  distribution  is  often  used  to  approximate 

the  binomial  distribution  when  n  is  "large"  and  p  is  "small"  (a  general  rule  is  that  n  should  be 
greater  than  or  equal  to  20  and  p  should  be  less  than  or  equal  to  .05). 

Population 

The  collection,  or  set,  of  all  individuals,  objects,  or  measurements  whose  properties  are  being 
studied. 

Probability 

A  number  between  0  and  1,  inclusive,  that  gives  the  likelihood  that  a  specific  event  will  occur. 
The  foundation  of  statistics  is  given  by  the  following  3  axioms  (by  A.  N.  Kolmogorov,  1930's): 
Let  S  denote  the  sample  space  and  A  and  B  are  two  events  in  S  .  Then: 

•0  <  P(A)  <  1;. 
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•  If  A  and  B  are  any  two  mutually  exclusive  events,  then  P  {AorB)  —  P  {A)  +  P  (B). 

•  P(S)  =  1. 

Probability  Distribution  Function  (PDF) 

A  mathematical  description  of  a  discrete  random  variable  (RV),  given  either  in  the  form  of  an 
equation  (formula) ,  or  in  the  form  of  a  table  listing  all  the  possible  outcomes  of  an  experiment 
and  the  probability  associated  with  each  outcome. 

Example:  A  biased  coin  with  probability  0.7  for  a  head  (in  one  toss  of  the  coin)  is  tossed  5  times. 
We  are  interested  in  the  number  of  heads  (the  RV  X  =  the  number  of  heads).  X  is  Binomial,  so 

X  ~  B  (5, 0.7)  and  P  (X  =  x)  =  I   ^   |  .7^.3^-*or  in  the  form  of  the  table: 


X 

P{X  =  x) 

0 

0.0024 

1 

0.0284 

2 

0.1323 

3 

0.3087 

4 

0.3602 

5 

0.1681 

Table  4.3 

Proportion 


•  As  a  number:  A  proportion  is  the  number  of  successes  divided  by  the  total  number  in  the 
sample. 

•  As  a  probability  distribution:  Given  a  binomial  random  variable  (RV),  X  ~B  (n,  p),  consider 
the  ratio  of  the  number  X  of  successes  in  n  Bemouli  trials  to  the  number  n  of  trials.  P'  —  ^■ 
This  new  RV  is  called  a  proportion,  and  if  the  number  of  trials,  n,  is  large  enough,  P' 

Q  Qualitative  Data 
See  Data. 
Quantitative  Data 
Quartiles 

The  numbers  that  separate  the  data  into  quarters.  Quartiles  may  or  may  not  be  part  of  the  data. 
The  second  quartile  is  the  median  of  the  data. 

R  Random  Variable  (RV) 
see  Variable 
Relative  Frequency 

The  ratio  of  the  number  of  times  a  value  of  the  data  occurs  in  the  set  of  all  outcomes  to  the 
number  of  all  outcomes. 

S  Sample 
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A  portion  of  the  population  understudy.  A  sample  is  representative  if  it  characterizes  the 
population  being  studied. 

Sample  Space 

The  set  of  all  possible  outcomes  of  an  experiment. 
Standard  Deviation 

A  number  that  is  equal  to  the  square  root  of  the  variance  and  measures  how  far  data  values  are 
from  their  mean.  Notation:  s  for  sample  standard  deviation  and  cr  for  population  standard 
deviation. 

Standard  Error  of  the  Mean 

The  standard  deviation  of  the  distribution  of  the  sample  means, 

Standard  Normal  Distribution 

A  continuous  random  variable  (RV)  X~N  (0, 1) ..  When  X  follows  the  standard  normal 
distribution,  it  is  often  noted  as  Z~N  (0, 1). 


A  numerical  characteristic  of  the  sample.  A  statistic  estimates  the  corresponding  population 
parameter.  For  example,  the  average  number  of  full-time  students  in  a  7:30  a.m.  class  for  this 
term  (statistic)  is  an  estimate  for  the  average  number  of  full-time  students  in  any  class  this  term 
(parameter). 

Student's-t  Distribution 

Investigated  and  reported  by  William  S.  Gossett  in  1908  and  published  under  the  pseudonym 
Student.  The  major  characteristics  of  the  random  variable  (RV)  are: 

•  It  is  continuous  and  assumes  any  real  values. 

•  The  pdf  is  S5anmetrical  about  its  mean  of  zero.  However,  it  is  more  spread  out  and  flatter  at 
the  apex  than  the  normal  distribution. 

•  It  approaches  the  standard  normal  distribution  as  n  gets  larger. 

•  There  is  a  "family"  of  t  distributions:  every  representative  of  the  family  is  completely 
defined  by  the  niraiber  of  degrees  of  freedom  which  is  one  less  than  the  number  of  data. 

Student-t  Distribution 
T  Tree  Diagram 

The  useful  visual  representation  of  a  sample  space  and  events  in  the  form  of  a  "tree"  with 
branches  marked  by  possible  outcomes  simultaneously  with  associated  probabilities 
(frequencies,  relative  frequencies). 

Type  1  Error 

The  decision  is  to  reject  the  NuU  hypothesis  when,  in  fact,  the  NuU  hypothesis  is  true. 
Type  2  Error 

The  decision  is  to  not  reject  the  NuU  hypothesis  when,  in  fact,  the  NuU  hypothesis  is  false. 

U  Uniform  Distribution 

A  continuous  random  variable  (RV)  that  has  equally  likely  outcomes  over  the  domain, 
a  <  X  <  b.  Often  referred  as  the  Rectangular  distribution  because  the  graph  of  the  pdf  has  the 
form  of  a  rectangle.  Notation:  X~U  {a,  b).  The  mean  isji  —  ^  and  the  standard  deviation  is 


Statistic 


cr  —  y  ^  ^2  The  probability  density  function  is  /  (X)  —  ^  ior  a  <  x  <  b  or  a  <  x  <  b.  The 
cumiilative  distribution  isP{X  <  x)  —  |5f . 
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V  Variable  (Random  Variable) 

A  characteristic  of  interest  in  a  population  being  studied.  Common  notation  for  variables  are 
upper  case  Latin  letters  X,  Y,  Z,...;  common  notation  for  a  specific  value  from  the  domain  (set  of 
all  possible  values  of  a  variable)  are  lower  case  Latin  letters  x,  y,  z,....  For  example,  if  X  is  the 
number  of  children  in  a  family,  then  x  represents  a  specific  integer  0, 1, 2,  3, ....  Variables  in 
statistics  differ  from  variables  in  intermediate  algebra  in  two  following  ways. 

•  The  domain  of  the  random  variable  (RV)  is  not  necessarily  a  numerical  set;  the  domain  may 
be  expressed  in  words;  for  example,  if  X  =  hair  color  then  the  domain  is  {black,  blond,  gray, 
green,  orange}. 

•  We  can  tell  what  specific  value  x  of  the  Random  Variable  X  takes  only  after  performing  the 
experiment. 

Variance 

Mean  of  the  squared  deviations  from  the  mean.  Square  of  the  standard  deviation.  For  a  set  of 
data,  a  deviation  can  be  represented  asx  —  x  where  xisa  value  of  the  data  and  x  is  the  sample 
mean.  The  sample  variance  is  equal  to  the  sum  of  the  squares  of  the  deviations  divided  by  the 
difference  of  the  sample  size  and  1. 

Venn  Diagram 

The  visual  representation  of  a  sample  space  and  events  in  the  form  of  circles  or  ovals  showing 
their  intersections. 

Z  z-score 

The  linear  transformation  of  the  form  z  —        If  this  transformation  is  applied  to  any  normal 
distribution  X~N  (^,  <x) ,  the  result  is  the  standard  normal  distribution  Z~N  (0, 1).  If  this 
transformation  is  applied  to  any  specific  value  x  of  the  RV  with  mean  y.  and  standard  deviation 
a ,  the  result  is  called  the  z-score  of  x.  Z-scores  allow  us  to  compare  data  that  are  normally 
distributed  but  scaled  differently. 
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URL:  http://cnx.Org/content/ml6014/l.17/ 
Pages:  24-31 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Variation  and  Critical  Evaluation" 

Used  here  as:  "Variation" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.0rg/content/ml6O2i/l.i5/ 

Pages:  31-33 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 
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Module:  "Sampling  and  Data:  Answers  and  Rounding  Off" 
Used  here  as:  "Answers  and  Rounding  Off" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml6OO6/l.8/ 
Page:  33 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Frequency,  Relative  Frequency,  and  Cumulative  Frequency" 

Used  here  as:  "Frequency" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.0rg/content/ml6Oi2/l.2O/ 

Pages:  33-37 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Siimmary" 
Used  here  as:  "Summary" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml6O23/l.lO/ 
Page:  38 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Practice  1" 
Used  here  as:  "Practice:  Sampling  and  Data" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml6Ol6/l.l6/ 
Pages:  39-41 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml6OlO/l.i9/ 
Pages:  42-49 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Sampling  and  Data:  Data  Collection  Lab  I" 
Used  here  as:  "Lab  1:  Data  Collection" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml6OO4/l.ll/ 
Pages:  50-51 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.0rg/licenses/by/2.O/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Sampling  and  Data:  Sampling  Experiment  Lab  11" 
Used  here  as:  "Lab  2:  Sampling  Experiment" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6013/l.15/ 
Pages:  52-54 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Introduction" 
Used  here  as:  "Descriptive  Statistics" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6300/l.9/ 
Page:  59 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Displaying  Data" 
Used  here  as:  "Displaying  Data" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http:/ /cnx.org/content/ml6297/ 1.9/ 
Pages:  59-60 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Stem  and  Leaf  Graphs  (Stemplots),  Line  Graphs  and  Bar  Graphs" 
Used  here  as:  "Stem  and  Leaf  Graphs  (Stemplots),  Line  Graphs  and  Bar  Graphs" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6849/l.17/ 
Pages:  60-63 

Copyright:  Maxfield  Foimdation 

License:  http://creativeconum.ons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Histogram" 
Used  here  as:  "Histograms" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6298/l.14/ 
Pages:  63-67 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Box  Plot" 
Used  here  as:  "Box  Plots" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6296/l.13/ 
Pages:  68-71 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 
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Module:  "Descriptive  Statistics:  Measuring  the  Location  of  the  Data" 
Used  here  as:  "Measures  of  the  Location  of  the  Data" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6314/l.18/ 
Pages:  71-76 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/Ucenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Measuring  the  Center  of  the  Data" 
Used  here  as:  "Measures  of  the  Center  of  the  Data" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7102/l.13/ 
Pages:  76-79 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Skewness  and  the  Mean,  Median,  and  Mode" 
Used  here  as:  "Skewness  and  the  Mean,  Median,  and  Mode" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7104/l.9/ 
Pages:  79-80 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Measuring  the  Spread  of  the  Data" 
Used  here  as:  "Measures  of  the  Spread  of  the  Data" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7103/l.15/ 
Pages:  81-88 

Copyright:  Maxfield  Foimdation 

License:  http://creativecomjm.ons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Summary  of  Formulas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6310/l.9/ 
Page:  89 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Practice  1" 
Used  here  as:  "Practice  1:  Center  of  the  Data" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6312/l.12/ 
Pages:  90-92 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Descriptive  Statistics:  Practice  2" 
Used  here  as:  "Practice  2:  Spread  of  the  Data" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7105/l.12/ 
Pages:  93-94 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/Ucenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6801/l.25/ 
Pages:  95-111 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Descriptive  Statistics:  Descriptive  Statistics  Lab" 
Used  here  as:  "Lab:  Descriptive  Statistics" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6299/l.13/ 
Pages:  112-113 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Introduction" 
Used  here  as:  "Probability  Topics" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6838/l.ll/ 
Pages:  121-122 

Copyright:  Maxfield  Foimdation 

License:  http://creativecomm.ons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Terminology" 
Used  here  as:  "Terminology" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6845/l.13/ 
Pages:  122-124 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Independent  &  Mutually  Exclusive  Events" 
Used  here  as:  "Independent  and  Mutually  Exclusive  Events" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6837/l.14/ 
Pages:  124-127 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Probability  Topics:  Two  Basic  Rules  of  Probability" 
Used  here  as:  "Two  Basic  Rules  of  Probability" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6847/l.ll/ 
Pages:  127-131 

Copyright:  Maxfield  Foundation 

License:  http: /  / creativecommons.org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Contingency  Tables" 
Used  here  as:  "Contingency  Tables" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6835/l.12/ 
Pages:  131-134 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Venn  Diagrams  (optional)" 
Used  here  as:  "Venn  Diagrams  (optional)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6848/l.12/ 
Pages:  134-135 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Tree  Diagrams  (optional)" 
Used  here  as:  "Tree  Diagrams  (optional)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6846/l.10/ 
Pages:  135-139 

Copyright:  Maxfield  Foimdation 

License:  http:/ /creativecommons.org/licenses/by/2.0/ 

Module:  "Probability  Topics:  Summary  of  Formiilas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6843/l.5/ 
Page:  140 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Practice" 
Used  here  as:  "Practice  1:  Contingency  Tables" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6839/l.ll/ 
Pages:  141-142 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.org/content/coll0522/l.' 
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Module:  "Probability  Topics:  Practice  11" 

Used  here  as:  "Practice  2:  Calculating  Probabilities" 

By:  Susan  Dean,  Barbara  lllowsky  Ph.D. 

URL:  http:/ /cnx.org/content/ml6840/ 1.12/ 

Page:  143 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6836/l.21/ 
Pages:  144-154 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Probability  Topics:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http:/ /cnx.org/content/ml6842/ 1.9/ 
Pages:  155-156 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/2.0/ 

Module:  "Probability  Topics:  Probability  Lab" 
Used  here  as:  "Lab:  Probability  Topics" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.org/content/ml6841/ 1.15/ 
Pages:  157-159 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Discrete  Random  Variables:  Introduction" 
Used  here  as:  "Discrete  Random  Variables" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6825/l.14/ 
Pages:  167-168 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Probability  Distribution  Function  (PDF)  for  a  Discrete  Random  Vari- 
able" 

Used  here  as:  "Probability  Distribution  Function  (PDF)  for  a  Discrete  Random  Variable" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6831/l.14/ 
Pages:  168-169 

Copyright:  Maxfield  Foundation 

License:  http:  /  / creativecommons .org/ licenses/by  / 3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


ATTRIBUTIONS 


693 


Module:  "Discrete  Random  Variables:  Mean  or  Expected  Value  and  Standard  Deviation" 
Used  here  as:  "Mean  or  Expected  Value  and  Standard  Deviation" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6828/l.16/ 
Pages:  169-172 

Copyright:  Maxfield  Foundation 

License:  http:// creativecommons.org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Common  Discrete  Probability  Distribution  Functions" 
Used  here  as:  "Common  Discrete  Probability  Distribution  Fimctions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6821/l.6/ 
Page:  172 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Modiile:  "Discrete  Random  Variables:  Binomial" 
Used  here  as:  "Binomial" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6820/l.17/ 
Pages:  172-175 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Geometric  (optional)" 
Used  here  as:  "Geometric  (optional)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6822/l.16/ 
Pages:  175-177 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Hj^ergeometric  (optional)" 
Used  here  as:  "Hypergeometric  (optional)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6824/l.16/ 
Pages:  178-180 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Poisson  (optional)" 

Used  here  as:  "Poisson" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml6829/l.16/ 

Pages:  180-182 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Discrete  Random  Variables:  Summary  of  the  Discrete  Probability  Functions" 
Used  here  as:  "Summary  of  Functions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6833/l.ll/ 
Pages:  183-184 

Copyright:  Maxfield  Foundation 

License:  http:// creativecommons.org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Practice  1:  Discrete  Distributions" 
Used  here  as:  "Practice  1:  Discrete  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6830/l.14/ 
Page:  185 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Modiile:  "Discrete  Random  Variables:  Practice  2:  Binomial  Distribution" 
Used  here  as:  "Practice  2:  Binomial  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7107/l.18/ 
Pages:  186-187 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Practice  3:  Poisson  Distribution" 
Used  here  as:  "Practice  3:  Poisson  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7109/l.15/ 
Page:  188 

Copyright:  Maxfield  Foimdation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Practice  4:  Geometric  Distribution" 
Used  here  as:  "Practice  4:  Geometric  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7108/l.17/ 
Pages:  189-190 

Copyright:  Maxfield  Foundation 

License:  http://creativecortraions.Org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Practice  5:  H5^ergeometric  Distribution" 
Used  here  as:  "Practice  5:  Hypergeometric  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7106/l.13/ 
Page:  191 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Discrete  Random  Variables:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml6823/l.20/ 

Pages:  192-201 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Discrete  Random  Variables:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6832/l.ll/ 
Pages:  202-204 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 
Modiile:  "Discrete  Random  Variables:  Lab  1" 

Used  here  as:  "Lab  1:  Discrete  Distribution  (Playing  Card  Experiment)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http:/ /cnx.org/content/ml6827/ 1.12/ 
Pages:  205-208 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 
Module:  "Discrete  Random  Variables:  Lab  11" 

Used  here  as:  "Lab  2:  Discrete  Distribution  (Lucky  Dice  Experiment)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6826/l.12/ 
Pages:  209-212 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Continuous  Random  Variables:  Introduction" 
Used  here  as:  "Continuous  Random  Variables" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6808/l.12/ 
Pages:  221-223 

Copyright:  Maxfield  Foundation 

License:  http://creativecomjmons.Org/licenses/by/3.0/ 

Module:  "Continuous  Random  Variables:  Continuous  Probability  Functions" 
Used  here  as:  "Continuous  Probability  Functions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6805/l.9/ 
Pages:  223-225 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 
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Module:  "Continuous  Random  Variables:  The  Uniform  Distribution" 
Used  here  as:  "The  Uniform  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6819/l.17/ 
Pages:  226-233 

Copyright:  Maxfield  Foundation 

License:  http :  /  /  creativecommons  .org  /  licenses  /by  /  3 . 0  / 

Module:  "Continuous  Random  Variables:  The  Exponential  Distribution" 
Used  here  as:  "The  Exponential  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6816/l.15/ 
Pages:  233-238 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Continuous  Random  Variables:  Summary  of  The  Uniform  and  Exponential  Probability  Distribu- 
tions" 

Used  here  as:  "Summary  of  the  Uniform  and  Exponential  Probability  Distributions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6813/l.10/ 

Page:  239 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Continuous  Random  Variables:  Practice  1" 
Used  here  as:  "Practice  1:  Uniform  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http ://  cnx.org / content /  ml6812/1.14/ 
Pages:  240-242 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Continuous  Random  Variables:  Practice  2" 
Used  here  as:  "Practice  2:  Exponential  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml68ll/l.ll/ 
Pages:  243-244 

Copyright:  Maxfield  Foundation 

License:  http :  /  /  creativecommons  .org  /  licenses  /by  /  3 .0/ 

Module:  "Continuous  Random  Variables:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml6807/l.14/ 

Pages:  245-250 

Copyright:  Maxfield  Foundation 

License:  http:  /  /  creativecommons  .org/  licenses/by  /  3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Continuous  Random  Variables:  Review" 

Used  here  as:  "Review" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.0rg/content/ml68lO/l.ll/ 

Pages:  251-253 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Continuous  Random  Variables:  Lab  I" 
Used  here  as:  "Lab:  Continuous  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.0rg/content/ml68O3/l.i3/ 
Pages:  254-256 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Normal  Distribution:  Introduction" 
Used  here  as:  "The  Normal  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6979/l.12/ 
Pages:  261-262 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Standard  Normal  Distribution" 
Used  here  as:  "The  Standard  Normal  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6986/l.7/ 
Page:  262 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.0rg/licenses/by/2.O/ 

Module:  "Normal  Distribution:  Z-scores" 
Used  here  as:  "Z-scores" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6991/l.10/ 
Pages:  263-264 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Areas  to  the  Left  and  Right  of  x" 
Used  here  as:  "Areas  to  the  Left  and  Right  of  x" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6976/l.5/ 
Page:  265 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.0rg/licenses/by/2.O/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Normal  Distribution:  Calculations  of  Probabilities" 
Used  here  as:  "Calculations  of  Probabilities" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6977/l.12/ 
Pages:  265-268 

Copyright:  Maxfield  Foundation 

License:  http:// creativecommons.org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Summary  of  Formulas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6987/l.5/ 
Page:  269 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Practice" 
Used  here  as:  "Practice:  The  Normal  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6983/l.10/ 
Pages:  270-271 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6978/l.20/ 
Pages:  272-277 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Normal  Distribution:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6985/l.10/ 
Pages:  278-279 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Normal  Distribution:  Normal  Distribution  Lab  I" 
Used  here  as:  "Lab  1:  Normal  Distribution  (Lap  Times)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6981/l.18/ 
Pages:  280-282 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 


ATTRIBUTIONS 


699 


Module:  "Normal  Distribution:  Normal  Distribution  Lab  11" 
Used  here  as:  "Lab  2:  Normal  Distribution  (Pinkie  Length)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6980/l.16/ 
Pages:  283-284 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Introduction" 
Used  here  as:  "The  Central  Limit  Theorem" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6953/l.17/ 
Pages:  289-290 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Central  Limit  Theorem  for  Sample  Means" 
Used  here  as:  "The  Central  Limit  Theorem  for  Sample  Means  (Averages)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.org/content/ml6947/ 1.23/ 
Pages:  290-292 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Central  Limit  Theorem  for  Sirais" 
Used  here  as:  "The  Central  Limit  Theorem  for  Sums" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6948/l.16/ 
Pages:  293-294 

Copyright:  Maxfield  Foimdation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Using  the  Central  Limit  Theorem" 
Used  here  as:  "Using  the  Central  Limit  Theorem" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6958/l.21/ 
Pages:  294-300 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Summary  of  Formulas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6956/l.8/ 
Page:  301 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Central  Limit  Theorem:  Practice" 
Used  here  as:  "Practice:  The  Central  Limit  Theorem" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6954/l.12/ 
Pages:  302-304 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6952/l.24/ 
Pages:  305-311 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6955/l.12/ 
Pages:  312-313 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Central  Limit  Theorem  Lab  1" 
Used  here  as:  "Lab  1:  Central  Limit  Theorem  (Pocket  Change)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6950/l.10/ 
Pages:  314-317 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Central  Limit  Theorem:  Central  Limit  Theorem  Lab  11" 
Used  here  as:  "Lab  2:  Central  Limit  Theorem  (Cookie  Recipes)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6945/l.ll/ 
Pages:  318-322 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Introduction" 
Used  here  as:  "Confidence  Intervals" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6967/l.16/ 
Pages:  327-329 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Confidence  Intervals:  Confidence  Interval,  Single  Population  Mean,  Population  Standard  Devia- 
tion Known,  Normal" 

Used  here  as:  "Confidence  Interval,  Single  Population  Mean,  Popiilation  Standard  Deviation  Known,  Nor- 
mal" 

By:  Susan  Dean,  Barbara  lUowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6962/l.23/ 
Pages:  329-336 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Confidence  Interval,  Single  Popiilation  Mean,  Standard  Deviation  Un- 
known, Student's-t" 

Used  here  as:  "Confidence  Interval,  Single  Popiilation  Mean,  Standard  Deviation  Unknown,  Student's-t" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6959/l.24/ 
Pages:  336-339 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Confidence  Interval  for  a  Population  Proportion" 
Used  here  as:  "Confidence  Interval  for  a  Popiilation  Proportion" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6963/l.20/ 
Pages:  339-343 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Summary  of  Formiilas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6973/l.8/ 
Page:  344 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 
Module:  "Confidence  Intervals:  Practice  1" 

Used  here  as:  "Practice  1:  Confidence  Intervals  for  Means,  Known  Population  Standard  Deviation" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6970/l.13/ 
Pages:  345-346 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 
Module:  "Confidence  Intervals:  Practice  2" 

Used  here  as:  "Practice  2:  Confidence  Intervals  for  Means,  Unknown  Population  Standard  Deviation" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6971/l.14/ 
Pages:  347-348 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Confidence  Intervals:  Practice  3" 

Used  here  as:  "Practice  3:  Confidence  Intervals  for  Proportions" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml6968/l.13/ 

Pages:  349-350 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Homework" 
Used  here  as:  "Homework" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6966/l.16/ 
Pages:  351-360 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6972/l.10/ 
Pages:  361-363 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Confidence  Interval  Lab  I" 
Used  here  as:  "Lab  1:  Confidence  Interval  (Home  Costs)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6960/l.ll/ 
Pages:  364-366 

Copyright:  Maxfield  Foimdation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Confidence  Interval  Lab  11" 
Used  here  as:  "Lab  2:  Confidence  Interval  (Place  of  Birth)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http: /  / cnx.org/content/ml6961/ 1.11  / 
Pages:  367-368 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Confidence  Intervals:  Confidence  Interval  Lab  III" 
Used  here  as:  "Lab  3:  Confidence  Interval  (Womens'  Heights)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6964/l.12/ 
Pages:  369-370 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "H5^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Introduction" 
Used  here  as:  "Hj^othesis  Testing:  Single  Mean  and  Single  Proportion" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6997/l.ll/ 
Pages:  377-378 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Null  and  Alternate  Hypotheses" 
Used  here  as:  "Null  and  Alternate  H5rpotheses" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6998/l.14/ 
Pages:  378-379 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Modiile:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Outcomes  and  the  Type  I  and  Type  II 
Errors" 

Used  here  as:  "Outcomes  and  the  Type  I  and  Type  II  Errors" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7006/l.8/ 

Pages:  379-380 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Distribution  Needed  for  Hypothesis 
Testing" 

Used  here  as:  "Distribution  Needed  for  Hypothesis  Testing" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7017/l.13/ 
Pages:  380-381 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Modiile:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Assiraiptions" 

Used  here  as:  "Assumption" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7002/l.16/ 

Page:  381 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H}rpothesis  Testing  of  Single  Mean  and  Single  Proportion:  Rare  Events" 

Used  here  as:  "Rare  Events" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml6994/l.8/ 

Page:  381 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "H5^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Using  the  Sample  to  Test  the  Null 
H5q30thesis" 

Used  here  as:  "Using  the  Sample  to  Test  the  Null  Hypothesis" 
By:  Susan  Dean,  Barbara  lUowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6995/l.17/ 

Pages:  382-383 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H}rpothesis  Testing  of  Single  Mean  and  Single  Proportion:  Decision  and  Conclusion" 
Used  here  as:  "Decision  and  Conclusion" 
By:  Susan  Dean,  Barbara  lUowsky,  Ph.D. 
URL:  http: / /cnx.org/content/ml6992/ 1.11  / 

Page:  383 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Additional  Information" 
Used  here  as:  "Additional  Information" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6999/l.13/ 

Pages:  383-384 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Summary  of  the  Hypothesis  Test" 
Used  here  as:  "Summary  of  the  H5q30thesis  Test" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.org/content/ml6993/ 1.6/ 
Page:  385 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H5^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Examples" 

Used  here  as:  "Examples" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7005/l.25/ 

Pages:  385-395 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Summary  of  Formulas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6996/l.9/ 
Page:  396 

Copyright:  Maxfield  Foundation 

License:  http:  /  / creativecommons .org/ licenses/by  / 3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "H5^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Practice  1" 
Used  here  as:  "Practice  1:  Single  Mean,  Known  Population  Standard  Deviation" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7004/l.ll/ 
Pages:  397-398 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Practice  2" 
Used  here  as:  "Practice  2:  Single  Mean,  Unknown  Popiilation  Standard  Deviation" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7016/l.13/ 
Pages:  399-400 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Modiile:  "Hj^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Practice  3" 
Used  here  as:  "Practice  3:  Single  Proportion" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7003/l.15/ 
Pages:  401-402 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7001/l.14/ 

Pages:  403-415 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H5^othesis  Testing  of  Single  Mean  and  Single  Proportion:  Review" 

Used  here  as:  "Review" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7013/l.12/ 

Pages:  416-418 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Single  Mean  and  Single  Proportion:  Lab" 
Used  here  as:  "Lab:  Hypothesis  Testing  of  a  Single  Mean  and  Single  Proportion" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7007/l.12/ 
Pages:  419-422 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "H5^othesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Introduction" 
Used  here  as:  "Hj^othesis  Testing:  Two  Population  Means  and  Two  Popiilation  Proportions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7029/l.9/ 
Pages:  429-430 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H5^othesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Comparing  Two 
Independent  Population  Means  with  Unknown  Population  Standard  Deviations" 

Used  here  as:  "Comparing  Two  Independent  Popiilation  Means  with  Unknown  Popiilation  Standard  Devi- 
ations" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7025/l.18/ 
Pages:  430-433 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Comparing  Two 
Independent  Population  Means  with  Known  Population  Standard  Deviations" 

Used  here  as:  "Comparing  Two  Independent  Population  Means  with  Known  Population  Standard  Devia- 
tions" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7042/l.10/ 

Pages:  433-435 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hjrpothesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Comparing  Two 
Independent  Population  Proportions" 

Used  here  as:  "Comparing  Two  Independent  Population  Proportions" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7043/l.12/ 
Pages:  435-437 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "H5rpothesis  Testing:  Two  Population  Means  and  Two  Popiilation  Proportions:  Matched  or  Paired 

Samples" 

Used  here  as:  "Matched  or  Paired  Samples" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7033/l.15/ 
Pages:  437-441 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Hjrpothesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Summary  of  Types 
of  H5^othesis  Tests" 

Used  here  as:  "Summary  of  Types  of  H5^othesis  Tests" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7044/l.5/ 
Page:  442 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "H}rpothesis  Testing:  Two  Popiilation  Means  and  Two  Population  Proportions:  Practice  1" 
Used  here  as:  "Practice  1:  Hypothesis  Testing  for  Two  Proportions" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7027/l.13/ 

Pages:  443-444 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing:  Two  Population  Means  and  Two  Population  Proportions:  Practice  2" 
Used  here  as:  "Practice  2:  Hypothesis  Testing  for  Two  Averages" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7039/l.12/ 

Pages:  445-446 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Two  Means  and  Two  Proportions:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http ://  cnx.org / content / ml7023 / 1 .21  / 

Pages:  447-458 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Two  Means  and  Two  Proportions:  Review" 

Used  here  as:  "Review" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7021/l.9/ 

Pages:  459-460 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Hypothesis  Testing  of  Two  Means  and  Two  Proportions:  Lab  I" 
Used  here  as:  "Lab:  Hypothesis  Testing  for  Two  Means  and  Two  Proportions" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http ://  crvc.org/ content /ml7022/ 1.13/ 
Pages:  461-465 

Copyright:  Maxfield  Foundation 

License:  http:  /  / creativecommons .org/ licenses/by  / 3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "The  Chi-Square  Distribution:  Introduction" 
Used  here  as:  "The  Chi-Square  Distribution" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7048/l.9/ 
Pages:  471-472 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/Ucenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Notation" 
Used  here  as:  "Notation" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7052/l.6/ 
Page:  472 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Facts  About  The  Chi-Square  Distribution" 
Used  here  as:  "Facts  About  the  Chi-Square  Distribution" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.org/content/ml7045/ 1.6/ 
Pages:  472-473 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Goodness-of-Fit  Test" 
Used  here  as:  "Goodness-of-Fit  Test" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7192/l.8/ 
Pages:  474-481 

Copyright:  Maxfield  Foimdation 

License:  http://creativecor)nm.ons.org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Test  of  Independence" 
Used  here  as:  "Test  of  Independence" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7191/l.12/ 
Pages:  481-485 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Test  for  Homogeneity" 
By:  Susan  Dean 

URL:  http: / /cnx.org/content/m43655/ 1.2/ 

Pages:  485-487 
Copyright:  Susan  Dean 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Comparison  Summary  of  the  Chi-Square  Tests  Goodness-of-Fit, 
Independence  and  Homogeneity" 
By:  Susan  Dean 

URL:  http://cnx.Org/content/m43654/l.2/ 
Page:  488 

Copyright:  Susan  Dean 

License:  http: /  /creativecommons.org/licenses/by/3.0/ 

Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "The  Chi-Square  Distribution:  Test  of  a  Single  Variance" 
Used  here  as:  "Test  of  a  Single  Variance" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7059/l.7/ 
Pages:  488-490 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Summary  of  Formulas" 
Used  here  as:  "Summary  of  Formulas" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7058/l.8/ 
Page:  491 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Practice  1" 
Used  here  as:  "Practice  1:  Goodness-of-Fit  Test" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7054/l.12/ 
Pages:  492-493 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Practice  2" 
Used  here  as:  "Practice  2:  Contingency  Tables" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7056/l.13/ 
Pages:  494-495 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Practice  3" 
Used  here  as:  "Practice  3:  Test  of  a  Single  Variance" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7053/l.8/ 
Pages:  496-497 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7028/l.20/ 

Pages:  498-506 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "The  Chi-Square  Distribution:  Review" 
Used  here  as:  "Review" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7057/l.ll/ 
Pages:  507-510 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/Ucenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Lab  I" 
Used  here  as:  "Lab  1:  Chi-Square  Goodness-of-Fit" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7049/l.9/ 
Pages:  511-515 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "The  Chi-Square  Distribution:  Lab  II" 

Used  here  as:  "Lab  2:  Chi-Square  Test  for  Independence" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7050/l.ll/ 

Pages:  516-517 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Introduction" 
Used  here  as:  "Linear  Regression  and  Correlation" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7089/l.6/ 
Page:  523 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Linear  Equations" 

Used  here  as:  "Linear  Equations" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7086/l.4/ 

Pages:  523-525 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Linear  Regression  and  Correlation:  Slope  and  Y-Intercept  of  a  Linear  Equation" 
Used  here  as:  "Slope  and  Y-Intercept  of  a  Linear  Equation" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7083/l.5/ 
Page:  525 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Linear  Regression  and  Correlation:  Scatter  Plots" 

Used  here  as:  "Scatter  Plots" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7082/l.8/ 

Pages:  526-527 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  The  Regression  Equation" 
Used  here  as:  "The  Regression  Equation" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7090/l.15/ 
Pages:  528-533 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Correlation  Coefficient  and  Coefficient  of  Determination" 
Used  here  as:  "Correlation  Coefficient  and  Coefficient  of  Determination" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7092/l.12/ 
Pages:  534-535 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Testing  the  Significance  of  the  Correlation  Coefficient" 
Used  here  as:  "Testing  the  Significance  of  the  Correlation  Coefficient" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7077/l.15/ 
Pages:  536-540 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Prediction" 

Used  here  as:  "Prediction" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7095/l.8/ 

Page:  541 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Outliers" 

Used  here  as:  "Outliers" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7094/l.14/ 

Pages:  541-547 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Linear  Regression  and  Correlation:  95%  Critical  Values  of  the  Sample  Correlation  Coefficient 
Table" 

Used  here  as:  "95%  Critical  Values  of  the  Sample  Correlation  Coefficient  Table" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http: / /cnx.org/content/ml7098/ 1.6/ 
Pages:  548-550 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Summary" 

Used  here  as:  "Summary" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7081/l.4/ 

Page:  551 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Linear  Regression  and  Correlation:  Practice" 
Used  here  as:  "Practice:  Linear  Regression" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7088/l.12/ 

Pages:  552-554 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http :// cnx.org / content /  ml7085/1.14/ 

Pages:  555-570 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Regression  Lab  I" 
Used  here  as:  "Lab  1:  Regression  (Distance  from  School)" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7080/l.ll/ 
Pages:  571-573 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Linear  Regression  and  Correlation:  Regression  Lab  11" 
Used  here  as:  "Lab  2:  Regression  (Textbook  Cost)" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7087/l.9/ 
Pages:  574-575 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/2.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Linear  Regression  and  Correlation:  Regression  Lab  111" 
Used  here  as:  "Lab  3:  Regression  (Fuel  Efficiency)" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7079/l.8/ 
Pages:  576-578 

Copyright:  Maxfield  Foundation 

License:  http: /  / creativecommons.org/licenses/by/2.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Introduction" 
Used  here  as:  "F  Distribution  and  One-Way  ANOVA" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7065/l.ll/ 
Page:  583 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Purpose  and  Basic  Assiimptions  of  One-Way  ANOVA" 

Used  here  as:  "One-Way  ANOVA" 

By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7068/l.10/ 

Pages:  584-585 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  The  F  Distribution  And  The  F  Ratio" 
Used  here  as:  "The  F  Distribution  and  the  F  Ratio" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7076/l.14/ 
Pages:  585-589 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Facts  About  the  F  Distribution" 
Used  here  as:  "Facts  About  the  F  Distribution" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7062/l.14/ 
Pages:  589-593 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Test  of  Two  Variances" 
Used  here  as:  "Test  of  Two  Variances" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7075/l.8/ 
Pages:  593-595 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "F  Distribution  and  One- Way  ANOVA:  Summary" 

Used  here  as:  "Summary" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7072/l.4/ 

Page:  596 

Copyright:  Maxfield  Foundation 

License:  http :  /  / creativecommons .org / Ucenses /by / 3 .0 / 

Module:  "F  Distribution  and  One-Way  ANOVA:  Practice" 
Used  here  as:  "Practice:  One-Way  ANOVA" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7067/l.10/ 
Pages:  597-598 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Homework" 

Used  here  as:  "Homework" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7063/l.15/ 

Pages:  599-603 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons. org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  Review" 

Used  here  as:  "Review" 

By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 

URL:  http://cnx.Org/content/ml7070/l.9/ 

Pages:  604-607 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "F  Distribution  and  One-Way  ANOVA:  One-Way  ANOVA  Lab" 
Used  here  as:  "Lab:  One-Way  ANOVA" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7061/l.9/ 
Pages:  608-609 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Collaborative  Statistics:  Practice  Final  Exam  1" 
Used  here  as:  "Practice  Final  Exam  1" 
By:  Susan  Dean,  Barbara  Illowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6304/l.17/ 
Pages:  613-621 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 


Available  for  free  at  Connexions  <http://cnx.Org/content/coll0522/l.40> 
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Module:  "Collaborative  Statistics:  Practice  Final  Exam  2" 
Used  here  as:  "Practice  Final  Exam  2" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml6303/l.16/ 
Pages:  622-631 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/3.0/ 

Module:  "Collaborative  Statistics:  Data  Sets" 
Used  here  as:  "Data  Sets" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7132/l.5/ 
Pages:  632-634 

Copyright:  Maxfield  Foundation 

License:  http://creativecommons.Org/licenses/by/3.0/ 

Module:  "Collaborative  Statistics:  Projects:  Univariate  Data" 
Used  here  as:  "Group  Project:  Univariate  Data" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http:/ /cnx.org/content/ml7142/ 1.8/ 
Pages:  635-636 

Copyright:  Maxfield  Foundation 

License:  http:/ /creativecommons.org/licenses/by/2.0/ 

Module:  "Collaborative  Statistics:  Projects:  Continuous  Distributions  &  Central  Limit  Theorem" 
Used  here  as:  "Group  Project:  Continuous  Distributions  and  Central  Limit  Theorem" 
By:  Susan  Dean,  Barbara  lllowsky,  Ph.D. 
URL:  http://cnx.Org/content/ml7141/l.9/ 
Pages:  637-639 

Copyright:  Maxfield  Foimdation 

License:  http://creativecommons.Org/licenses/by/2.0/ 

Module:  "Collaborative  Statistics:  Projects:  H5^othesis  Testing  Article" 
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